Systems and methods for obtaining and managing sequence data

ABSTRACT

Systems and methods for biological sample processing are described. A production line extracts genomic DNA from a biological sample, amplifies target components of the sample and produces sequence data for markers from the amplified components. The markers are associated with tests identified in a requisition received with the sample and some markers may be associated with unrequisitioned tests. A sample information management system (SIMS) controls and monitors the production line and subsequent analysis of the results using information in a quality control (QC) database to validate the results. A repository comprising the QC database and a research database receives and aggregates the results without identifying the source of the sample. A portal may be provided to provide access to the research database to a plurality of external contributors. Contributors can selectively provide additional research data and data can be processed using data mining and curation tools.

RELATED APPLICATION

The present application is a continuation of U.S. Nonprovisionalapplication Ser. No. 13/348,626, filed Jan. 11, 2012, which claimspriority from U.S. Provisional Patent Application No. 61/431,668, filedJan. 11, 2011, and from U.S. Provisional Patent Application No.61/546,820 filed Oct. 13, 2011, which are all expressly incorporated byreference herein for all purposes.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to processes for analyzing DNAsequence data.

2. Description of Related Art

Conventional systems for processing samples generally perform testsdescribed or requested in a requisition. Additional testing on thesample requires that a portion of the sample be stored and processed ifa subsequent request for additional tests is received.

BRIEF SUMMARY OF THE INVENTION

Certain embodiments of the invention comprise a sample processingproduction line that may include a genomic DNA extractor configured toextract DNA from a biological sample. The production line typicallyincludes a target amplifier configured to amplify components of theextracted DNA and a sequencer that produces sequence data for aplurality of markers from the amplified components. The plurality ofmarkers may include markers associated with one or more tests identifiedin a requisition received with the sample. The plurality of markers mayalso include markers that are not associated with the one or more teststhat were requisitioned, but may instead be associated with other testswhich were not requisitioned or no test at all.

Certain of these embodiments comprise a sample information managementsystem (SIMS) that controls processing of the sample by the processingproduction line and analysis of the results of the processing of thesample and a quality control (QC) database that provides the SIMS withQC information. The SIMS may use the QC information to validateprocessing of samples and analysis of the results. A repositorycomprising one or more databases receives and aggregates the resultsgenerated by processing a plurality of samples. The repository mayinclude the quality control database and a research database. Ananalyzer may be used to generate test results using information in therepository as well as information obtained from sequencing and analysisof sequencing.

Typically, information identifying the source (patient, test subjectand/or healthcare provider) of the sample is removed from the sample,the requisition and the results. The SIMS monitors and controls theprocessing and analysis of the system using a unique identifier assignedto the sample, the requisition and the results. A subset of the resultscorresponding to requisitioned tests is generated for delivery to therequisitioner. The subset of results corresponds to a set of testsidentified in the requisition and the subset of the results, togetherwith any unrequisitioned results, may be maintained in the repositoryand may be aggregated in the research database.

Certain embodiments comprise a portal that selectively provides accessto data in the research database to a plurality of contributors. Theportal communicates with the plurality of contributors via a privateand/or public network, such as the Internet, a cellular telephonenetwork, a satellite communications network, an ad hoc network, a WiFinetwork, a WiMax network or other network. Contributors can selectivelyprovide additional research data to the research database and may useexternal data mining and curation tools and/or data curation and/ormining tools provided in certain embodiments of the invention to processinformation provided to the research database.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block schematic illustrating an example of a systemaccording to certain aspects of the invention.

FIG. 2 is a flowchart that illustrates a process flow according tocertain aspects of the invention.

FIG. 3 is a block schematic of a simplified example of a computer systememployed in certain embodiments of the invention.

FIG. 4 is received from a touch-enabled family history application thatcan be executed on a tablet computer or other mobile device.

FIG. 5 depicts an example in which a client or customer uses arequisition engine to record family health history for an individual bydrawing on the application with a finger or stylus, and/or by usingicons provided by a client application on the touch-screen enableddevice.

FIG. 6 depicts an example in which a client or customer can assigngenetic status and condition and/or disease information for individualsin the family history.

FIG. 7 is a block schematic illustrating an example of a systemaccording to certain aspects of the invention.

FIG. 8 is a flowchart that illustrates a process flow according tocertain aspects of the invention as they relate to filtering of sampledata based upon the specific requests made in the requisition in orderto limit the analysis of the sample data to those markers applicable tothe client requisition.

FIG. 9 is a flowchart that illustrates a process flow according tocertain aspects of the invention as they relate to the derivation ofattributes which are associated with phenotypic state through thecombination of data from one or more laboratory workflows.

FIG. 10 is a flowchart illustrating an example of a system according tocertain aspects of the invention as it relates to selective aggregationof results of laboratory analysis relevant to a requisition andgeneration of the applicable clinical report in response to suchrequisition.

DETAILED DESCRIPTION OF THE INVENTION

Embodiments of the present invention will now be described in detailwith reference to the drawings, which are provided as illustrativeexamples so as to enable those skilled in the art to practice theinvention. Notably, the figures and examples below are not meant tolimit the scope of the present invention to a single embodiment, butother embodiments are possible by way of interchange of some or all ofthe described or illustrated elements. Wherever convenient, the samereference numbers will be used throughout the drawings to refer to sameor like parts. Where certain elements of these embodiments can bepartially or fully implemented using known components, only thoseportions of such known components that are necessary for anunderstanding of the present invention will be described, and detaileddescriptions of other portions of such known components will be omittedso as not to obscure the invention. In the present specification, anembodiment showing a singular component should not be consideredlimiting; rather, the invention is intended to encompass otherembodiments including a plurality of the same component, and vice-versa,unless explicitly stated otherwise herein. Moreover, applicants do notintend for any term in the specification or claims to be ascribed anuncommon or special meaning unless explicitly set forth as such.Further, the present invention encompasses present and future knownequivalents to the components referred to herein by way of illustration.

Certain embodiments of the invention provide systems and methods thatmay be used to perform requisitioned tests and analyses of geneticinformation in a biological sample and that can perform other tests andanalyses on the sample in order to collect additional information.Moreover, certain aspects of the invention enable rapid response tosupplemental requisitions and for research purposes. Samples aretypically received with a requisition from a client, who is typically ahealth care provider.

In certain embodiments of the invention, the genetic material to beanalyzed is genomic DNA, however, persons of skill in the art willrecognize that the invention can be practiced with respect to thesequencing and analysis of other forms of genetic material includingmitochondrial DNA, cancer cell DNA such as may be extracted frombiopsied materials, germ-line DNA, cDNA, mRNA, and cell-free DNAfragments. Certain embodiments of the invention, references herein toDNA should be understood to embrace various forms of genetic materialamenable to sequencing including somatic cell DNA, genomic DNA, germline DNA, mitochondrial DNA, cancer cell DNA, cDNA, mRNA.

With reference to the block schematic drawing of FIG. 1, a sampleanalysis production line 11 receives samples, amplifies and purifies DNAextracted from the samples and analyzes the purified DNA according tocertain predefined protocols. Sample and requisition intake subsystem 10processes requisitions provided with samples, which are typicallysubmitted by a clinician or other healthcare provider. Requisitions mayexplicitly identify a set of tests to be performed on the sample,although, in some embodiments, the healthcare provider may establish, inadvance, a catalogue of “standard” tests to be performed on samplesprovided by that provider. In certain embodiments, the submitter of therequisition may request tests associated with one or more medicalconditions or diseases and such requested tests may be categorical,e.g., tests related to cancer risk, or specific, e.g., tests related toa single specific disease. In certain embodiments, an interactive intakesubsystem 10 can maintain an analysis of prior submissions by thehealthcare provider and may select a set of tests based on predefinedpreferences of the healthcare provider, frequency of occurrence of priorrequested tests, identification of a subject from which the sample wasobtained, time of year, and/or high volumes of recent test requestsdriven, for example, by an epidemic.

In certain embodiments, intake subsystem 10 prepares a workflow for eachsample, the workflow identifying the requisitioned test sets. Intakesubsystem 10 may validate the requisition requirements and may submitidentifying, biographic and other information in the requisition forfurther processing. The workflow specifies one or more tests to beperformed on materials included in the sample and the analyses of thetest results to be provided in a report to the requestor. Therequisitioned tests typically comprise a subset of the available teststhat can be performed on materials extracted from the sample. Accordingto certain aspects of the invention, the full set of available tests isperformed on the sample, yielding a number of results related tomarkers, sequence data and tests in excess of those necessary for thetests which were requisitioned. For a variety of reasons, includinggovernment regulation, potential liability issues and commercialreasons, it can be desirable to limit the analysis of the assay resultsto those tests determined to be relevant to the requisition and/or tolimit the analyzed results which are provided to the client to thoseresponsive to the requisition. In certain embodiments, such other assayresults data will be retained and available for analysis in connectionwith subsequent requisitions for additional tests without requiringfurther or additional processing of the original sample or a new samplefrom the subject.

Certain embodiments of the invention comprise a repository 17, whichtypically comprises a database, and which receives and maintainsinformation associated with each sample and requisition. The repository17 maintains test results and associates the test results with anoriginal requisition and sample. Information that can identify a patientwho is the source of the sample is typically removed from the sample andfrom the stored copies of the requisition. Each sample can be assigned aunique identifier that is also associated with the correspondingrequisition. Identifying information that directly identifies a patient,client and/or source of the sample is removed from information to bestored in the repository 17 and can be stored in a separate databasethat can indexed using the unique identifier. In one embodiment, theidentifying information may be encrypted using the unique identifier togenerate an encryption key and the encrypted identifying information maybe maintained in the repository 17 together with the information relatedto the sample and tests. In some embodiments, a key management systemcan be used to store and retrieve keys to enable samples and results tobe accessed in order to satisfy supplemental requisitions and for otherpurposes. The results of the full set of tests, analysis of each testcan also be indexed using the unique identifier.

In certain embodiments, the sample is entered into the production line11. Production line 11 performs a sequence of processing steps on thesample. An accessioning step is performed whereby the sample can beregistered and identified using a barcode, electronic tag such as anRFID, etc. The accessioning step may use the unique identifier discussedabove as a means of tagging and identifying the sample and itscomponents as they progress through the production line 11. Separationof the accessioning process from the intake subsystem allows use ofother, third party production line elements without compromisinganonymity of the presently described processes. In certain embodiments,the invention provides for receipt of sequence information from a thirdparty source as opposed to the performance of the sequencing activities.Such third party sources of sequence information could provide some orall of the sequence information used in the analysis of results inresponse to a requisition. Additionally, such third party sequence datacould be independently generated or generated in response to directionsto perform certain tests based upon the requisition.

Additional processes performed in the production line include a samplepreparation step. DNA may then be extracted and purified and provided onan assay plate used for amplification and sequencing.

In certain embodiments, certain production line operations are monitoredand controlled by a sample information management system (hereinreferred to as “SIMS”) 15. SIMS 11 typically comprises one or moreprocessors configured to communicate with instrumentation, databasemanagement systems (“DBMS”) and may be configured to control processingof the sample through the production line 11. Portions of SIMS 11 may beembodied in a controller or instrument of the production line. SIMS 11operates to control flow of information from intake through publicationof results. SIMS 11

In certain embodiments, a repository 17 receives and maintainsinformation associated with each received sample and requisition. Incertain embodiments, the repository may additionally compriseinformation from other sources. Repository 17 typically comprises adatabase and repository 17 can be used as a source of research data andfor quality control purposes. For example, results of tests that have apropensity to fail or to produce inaccurate results may be analyzedusing identifiers and quality metrics. Identifiers and quality metricsmay be derived from a spectrum of different test results for apopulation of samples and/or from research based on aggregated results.Quality metrics are typically employed by SIMS 15 to enable monitoringand tracking of results during processing of a sample. In one example, aresult of a first test falling outside a predetermined range ofacceptable results may be indicative of failure of a second test, thefailure of which is otherwise difficult to detect.

Data in repository 17 may be directly accessed for research purposes.However, certain research activities may require aggregation and/orredacting of data in repository 17. In one example, research data may beaccessed through a portal system 18 which operates as an intermediatebetween third party researchers and the repository 17. Portal mayexecute queries against the repository 17, extracting identifyinginformation before making the results available to a third party orpublic research entity. Portal 18 may permit a third party researcher toidentify data in the repository 17 that fits demographic and biographicprofiles. Portal 18 can determine such relationships using the uniqueidentifiers assigned during sample intake. In one example, repository 17can be queried by a third party for test results obtained from samplescontributed by persons matching a certain demographic profile. Theportal 18 may process the query, for example, by locating uniqueidentifiers associated with demographic records matching the demographicprofile. The located identifiers may then be used to identify completetest results for a study population of subjects matching the demographicprofile and relevant portions of the results for each identifier can bereturned to the researcher.

In certain embodiments, portal 18 may receive research data from thirdparty researchers for inclusion in the repository 17. Accordingly,repository 17 may include baseline, diagnostic, statistical and otherinformation used to process test results and for quality controlpurposes. Repository 17 may comprise a plurality of databases and a DBMSthat executes queries and performs searching and mapping functions. Incertain embodiments, one or more databases of repository 17 can bepopulated and maintained using data mining and curation tool 16.External and internal databases may be mined using tool 16 and tool 16may automate various curation functions on the repository 17, includingcollection, organization and validation functions.

FIG. 2 is a flow chart that describes a process flow employed in theexample shown in FIG. 1. At step 200, a user provides a sample with arequisition. Information in the requisition is analyzed and validated.The requisition defines one or more tests to be performed on the sampleand provides identifying information related to the user and a subjectwho is the source of the sample. The user may have defined a standard orminimum set of tests to be performed on the sample. In certainembodiments, the intake system may be configured to recognize codescorresponding to a plurality of standard test sets associated with auser or available globally. A requisition may include a combination ofindividual tests and/or test sets. Having validated the requisition, thesample is typically provided with a unique system identifier, which canbe used to track the sample testing process and to disassociate thesample and requisition from information that identifies the subject.Such disassociation enables data extracted from tests performed on thesample to be used for a wide variety of research activities whilemaintaining an indirect link to biographic, demographic, medical historyand other personal information.

At step 202, the sample is prepared for testing and the sample may bedivided into portions. One or more portion may be retained for retestand/or further testing. In certain embodiments of the invention, anidentical, complete set of tests is performed on all samples andpreparation and testing for all samples is consequently identical. Atstep 204, DNA is extracted from the sample and purified. Assays areprepared at step 206 and an amplification step 208 is performed.Ligation step 210 and sequencing step 212 produce raw data for analysisand reporting using a process 30 described with respect to FIG. 3.

According to certain aspects of the invention, portions of the sampleare processed to generate sequence data for all markers then availablefor analysis or more markers than are directly associated with therequisitioned test, which is then selectively analyzed and interpretedaccording to requirements set out in the requisition. Reported resultsare typically limited to the results of tests requested in therequisition. Additional results may be delivered in response to a secondrequisition provided by the user, where the second requisition directlyreferences the first requisition. Genetic condition/marker informationmay be delivered to the ordering clinician in a variety of ways. In someembodiments, markers are simultaneously sequenced for many conditions,often measurable by the hundreds or thousands (100s or 1,000s),regardless of the tests requisitioned by the clinician. Sequencesgenerated are targeted from the whole genome, thereby requiring a fairlyarduous curation, informatics and assay development process. Theanalyzed sequence is determined by the clinician requisition, supportedby a requisition engine that keeps track of clients and the markersordered for each client. In accordance with the current regulatoryenvironment, and to limit creation of liability issues for clinicians,only the requisitioned markers will be analyzed for genetic information.The remaining markers can be stored as raw data, and may bedisassociated with identifying information (made anonymous), aggregatedand stored for research and development and other purposes and/or asdata for internal use and for use with collaborators.

A re-requisition process may be implemented that allows the delivery ofthe entirety of the data or a portion of data other than therequisitioned test data to the client. This re-requisition process maybe provided through the requisition engine, by which further data on aspecific condition or data from other conditions can be automaticallyrequested and the data integrated into the client solutionautomatically.

In certain embodiments, the invention provides a method for analyzingdiagnostic information, comprising: (i) extracting genomic DNA from asample; (ii) obtaining sequence data for a plurality of markers, theplurality of markers including a first set of markers which areassociated with one or more tests identified in a first requisitioncorresponding to the sample and a second set of markers which are notassociated with the one or more tests identified in the requisition;(iii) generating a response to the requisition, the response being basedupon an analysis of sequence data corresponding to the first set ofmarkers and the one or more tests with the generation of the responseexcluding analysis of the second set of markers with respect to testsother than the one or more tests; (iv) assigning an identifier to thesequence data, the identifier identifying the sample, the requisitionand information related to the analysis; (v) storing the sequence datain a repository of data, the repository providing data for analysis; and(vi) reporting additional results of an analysis of the sequence dataassociated with one or markers of the second set of markers in responseto a second requisition, the second requisition identifying a testdifferent from the one or more tests.

In certain such embodiments, the invention provides the step ofassigning an identifier which includes rendering the source of thesample and the requisition anonymous. In certain embodiments, the secondrequisition includes the identifier. In certain embodiments, theinformation related to the analysis identifies the date of the analysis.In certain embodiments, the step of generating a response to therequisition includes performing quality control on the results of ananalysis of the sequence data. In certain embodiments, the step ofperforming quality control includes performing a comparison usingquality information derived from the repository. In certain embodiments,the step of performing quality control includes updating the qualityinformation using the results of the analysis of the sequence data. Incertain embodiments, the repository is accessible to contributorsthrough a portal. In certain embodiments, the invention provides theadditional step of updating the repository based on contributions madeby the contributors. In certain embodiments, the second requisition ismade after receiving the results of the first requisition. In certainembodiments, the test identified in the second requisition is notavailable for identification at the time the first requisition was made.In certain embodiments, the second requisition is made at the time ofthe first requisition and comprises a request for additional resultsassociated with one or more tests not then available for requisitionwith such results to be delivered after the occurrence of theavailability for requisition of at least one of such one or more testswhich were not available at the time the first requisition was made.

In certain embodiments, the invention provides a method for analyzingdiagnostic information, comprising the steps of (i) extracting genomicDNA from a sample; (ii) obtaining sequence data for a plurality ofmarkers, the plurality of markers including a first set of markers whichare associated with one or more tests identified in a first requisitioncorresponding to the sample and a second set of markers which are notassociated with the one or more tests identified in the requisition,wherein the first requisition includes a request to receive futurenotifications of health information associated with the sequence data;(iii) generating a response to the requisition, the response being basedupon an analysis of sequence data corresponding to the first set ofmarkers and the one or more tests with the generation of the responseexcluding analysis of the second set of markers with respect to testsother than the one or more tests; (iv) assigning an identifier to thesequence data, the identifier identifying the sample, the requisitionand information related to the analysis; (v) storing the sequence datain a repository of data, the repository providing data for analysis;(vi) after generating the response, generating a notification of healthinformation based upon an analysis of some of the sequence data whereinthe health information which is the subject of the notificationcomprises health information which is different from the healthinformation in the response; (vii) and generating such healthinformation after consent for its generation is received.

In certain such embodiments, the health information which is the subjectof the notification concerns a change to a prediction in the response tothe first requisition where such change arises due to a change in thecontent of the repository such as the availability of new computationalmodels, or the new availability of prediction methods for newconditions, change in the clinical interpretation of the originalrequisitioned conditions, update to the background information of theoriginal requisitioned conditions, or new information about a conditionthat was not yet available at the time of the requisition. In certainsuch embodiments, the health information which is the subject of thenotification concerns one or more tests which were not available forrequisition at the time of the first requisition. In certain suchembodiments, the health information which is the subject of thenotification is generated with reference to sequence data correspondingto at least one marker of the second set of markers. In certain suchembodiments, the consent for generating such health information isprovided with the first requisition.

Requisition and Sample

In certain embodiments of the invention, requisitioning users compriseclinicians, healthcare providers and intermediaries such as diagnosticcompanies, insurance companies, pharmacy benefits managers, or someother healthcare provider. An available test menu may be provided to auser interactively, although users may prepare requisitionsindependently, using agreed codes to define the tests to be performed.In one example, the system may comprise a database maintaining a menuhaving 50 or more tests. In another example, the database may maintain amenu comprising substantially all available or practical germlinegenetic tests. The requisition may be submitted through a client portal,an integrated electronic medical record system (“EMR system”), or in aconventional order form. The requisition may also provide instructionsthat specify levels of background analysis and threshold notification tobe performed for any conditions not specifically requested in therequisition.

The tests and/or analysis to be performed may be defined by the clientusing criteria to define a marker(s), set of genome coordinates,variant(s), gene(s), condition(s), or group(s) of conditions and theclient requisition may be entered manually, electronically, or pulledautomatically from an EMR or other means of electronic requisition. Incertain embodiments, a client may request a threshold scan, in thebackground or otherwise and the threshold may be quantitative in nature(e.g., testing whether the threshold is above/below or equal to apopulation percentile, absolute risk, relative risk, odds-ratio etc) orqualitative (at risk, not at risk, unknown, etc). The client thresholdrequest may be entered manually, electronically, or pulled automaticallyfrom an EMR or other means of electronic requisition.

The user may order one or more of the available tests, and may requestfull analysis. The user may identify an order by which examination ofspecific markers should be prioritized and which, if any, backgroundanalysis or specific genes should be checked responsive to the resultsof the prioritized tests.

A sample submitted for analysis may comprise blood, saliva, tissue, andthe like. The sample is typically associated with an identifier thatlinks the sample with its corresponding requisition. The identifier maycomprise a barcode, an RFID, or other optically or electronicallydetectable identifier. A requisition engine can be used to track thepatient/barcode relationship and to ultimately re-associate the barcodewith patient results while maintaining anonymity throughout the entireprocessing of the sample. Information in the requisition, includinginstructions identifying markers to be tested, follow the sample throughthe lab workflow and analysis and informs both the SIMS and an AnalysisEngine. The sample is accessioned and submitted to the lab workflow. Asample submitted may comprise blood, saliva, tissue, cells, and/orgenetic material in the form of genomic DNA, prepared DNA, synthesizedDNA, RNA or other genetic material.

Lab Workflow

Referring again to FIG. 1, the production line 11 typically performs avariety of functions under the control of, or according to instructionsprovided by SIMS 15. In one example, genomic DNA is purified from eachsample and regions of interest are amplified with primer sets developedusing an assay development component that is informed by the repository17. The primer sets are typically designed using proprietaryenhancements to Primer3. Amplification reactions are performed at a veryhigh multiplex, thousands of amplicons at anywhere from 20-300 per well.The amplified regions of interest for each subject are then ‘molecularlybarcoded’ using a ligation reaction. The resulting product of thousandsof amplicons for each patient across all regions of interest is thenprepared for sequencing and loaded onto the sequencing platform.Sequence preparation may be customized using a manufacturer protocol.Sequencing is then performed and all samples and work product can bestored for future reference. Storage of samples and work product istypically subject to government regulations, such as the ClinicalLaboratory Improvement Amendments promulgated by the U.S. Center forDisease Control and Infection. Throughout the entire process, eachsample is tracked by the SIMS. In particular, quality metrics, labinputs, operator, equipment information and all associated data arestored for immediate analysis and reporting or for future reference.

Analysis

In certain embodiments, raw sequences are first analyzed for generalquality control metrics using key data provided by the SIMS and based oninformation maintained in the repository 17. Quality control metrics maybe derived from research, statistical analysis of prior test results,controls and other information. Quality metrics may include expectedallele frequencies, sequence comparison to reference genome, geneticfingerprint checks, routine sequencing coverage and quality metrics, andso on. Quality metrics may be reviewed to ensure the quality of the runand to detect trends and issues with the production line 11. Somequality metrics may be specific and unique to the combination ofproduction line 11 and workflow employed. In particular, wild-typesignal patterns of each locus based on several hundred individuals canbe compared to the signal pattern observed in a requisitioned sample.Statistical modeling may be used to assess the probability of adifferent genotype at a locus of interest. The statistical modeltypically allows genotype determination of specific mutation types suchas insertions/deletions of varying sizes, single nucleotide changes,trinucleotide repeat changes, and copy number differences. Requisitionintake engine 10 can be queried to determine which sequence should beinterpreted for each subject sample. Subject sequence can be sorted by“molecular barcode,” and can be analyzed by referencing the repository17 and condition/marker data for each requisitioned test is ported to adata handling component 12. The data handling component may comprise adata menu that maintains condition/risk data available for clientintegration as well as access to the raw data should a re-requisitioncome from the requisition intake engine 10. Re-requisitions typicallyare received in response to a background notification or thresholdnotification indicating availability or potential availability ofsignificant and/or relevant results of unrequisitioned tests. For eachtest requisitioned, the sequence can be analyzed to provide the genotypeat the relevant loci, haploytype sequence, diplotype sequence,qualitative risk score, and quantitative risk score if appropriate forthe requisitioned condition. All of the data, as well as any data comingback from the client or collaborators, is stored in an anonymizedfashion in the repository database 17.

In certain embodiments, per base quality metrics may be provided forclinical interpretation. Examples and features of quality metricsinclude: (1) for clinical interpretation a base or set of bases (definedby the curated variant database) may be required to be equal to orhigher than a pre-defined quality threshold; (2) use of a pre-definedquality threshold to predict a particular sensitivity, specificity,false negative rate, false positive, negative predictive value, positivepredictive value and accuracy; (3) bases or sets of bases (as defined bythe curated variant database) that do not meet a pre-defined qualitythreshold may be considered no-calls and consequently no clinicalinterpretation may be provided; and (4) the quality threshold may beplatform specific with respect to sequencing technology employed and maybe determined, partially or wholly, by coverage, base quality values, orother parameters.

Client Integration

Based on the requisition received with a sample, appropriate data ispushed from data handling component 12 to a delivery subsystem or module13. Delivery subsystem can comprise any combination of software andhardware necessary to format, prepare and transmit results data to arequisitioning user (“client”). The method and content of data deliveryis typically configured by client and based on level of servicecontracted by the client. Portions of delivery subsystem 13 can beintegrated with a client computing system such as an EMR system. Forexample, an agent can be delivered to the client EMR system thatincludes computer instructions and data that cause the EMR system toreceive and record results and to produce a report of results that canbe printed and/or transmitted as an Email, SMS text message or otherelectronic message. Agent can provide and/or utilize various APIs thateither port directly to a client's EMR system, to a client'sphysician/patient portal, generate a fax/email to be delivered to aclient, etc. (the ‘lab report’). Results delivery system 13 can beadapted and configured to provide background or threshold notificationaccording to client preferences, client requests provided in therequisition and in compliance with applicable regulations. If backgroundor threshold notifications are enabled, the client can be informed ofsuch notices and an automatic ‘re-requisition’ can be generated and,when authorized by the client, can cause delivery of remaininginformation through the results delivery system 13.

In certain embodiments, the clinical interpretation of test results maybe augmented by information or samples provided by the client beyond thesubmitted proband's sample. Such additional information may includepedigree structure and affection status, symptoms, and results of otherdiagnostic tests. Moreover, additional samples or sequence results fromsources related to the patient or proband (such as parents, siblings,and/or offspring) may be submitted to facilitate clinicalinterpretation.

In certain embodiments, the invention provides a method for updating anddelivering additional or supplemental sample analyses, in which a clientindicates on a clinical requisition for the performance of certain testsaccording to the invention her desire to receive future notifications ofhealth information beyond the initial results related to the requestedtests, including but not limited to (i) updates to predictions orassessments relevant to the original requisition based on newcomputational methods; (ii) results related to a specific or generallist of conditions maintained by service provider for which predictionmethods are not available at the time of the original requisition butwhich may subsequently become available. In certain such embodimentspatient data is derived from data archived from the processing of asample in association with the original requisition (such as blood,plasma, stool, or other sample) where such data may be archived byservice provider or a third party. In certain of such embodiments, therequisitioning institution or physician is notified of the availabilityof updated clinical information beyond that reported initially inresponse to the original requisition. Such notification may include oneor more of the following: (i) Patient identifier, (ii) Patient date ofbirth, (iii) Patient gender, (iv) Original sample type(s) and date(s),(v) Original requisitioned condition(s), (vi) A description of theupdated information, (vii) An indication of the relevance of the updatedinformation, including: (x) possible changes in the clinicalinterpretation of the original requisitioned conditions, (y) update tothe background information of the original requisitioned conditions, (z)new information about a condition that was not yet available at the timeof the original requisition. In certain embodiments, the delivery ofthis information to the institution or physician is made upon agreementto receive the updated information, or may be made without suchagreement if previously waived.

For example, in certain embodiments, a client can select to eitherimmediately receive clinical results for all conditions wherenotification thresholds are met, or alternatively can select to receive“push notifications” about conditions where notification thresholds aremet following the original report in response to the requisition.Following receipt of such a push notification, the client can chose tore-requisition information on some or all of the conditions identifiedin the push notification through any of the supported requisitioningmeans and is then presented with the full clinical report by any of thedelivery methods of the invention.

For example, in certain embodiments, a client can select to eitherimmediately receive clinical results for all conditions wherenotification thresholds are met, or alternatively can select to receive“push notifications” about conditions where notification thresholds aremet following the original report in response to the requisition.Following receipt of such a push notification, the client can chose tore-requisition information on some or all of the conditions identifiedin the push notification through any of the supported requisitioningmeans and is then presented with the full clinical report by any of thedelivery methods of the invention.

Clinical Report Generation. In certain embodiments, a clinical reportmay be represented as data comprising an internal representation of allclinically relevant information or a rendering of that data in a humanor machine readable form. In certain embodiments, the analysis pipeline(pipe) produces a clinical report by applying requisitioned (or selectedin view of the requisition) risk models and auxiliary analysis(including, but not limited to variant of unknown significance (VUS)analysis). The internal representation of a clinical report contains allthe information needed in order to deliver the clinical report to theclient or requisitioning organization by one or more of the deliverymeans including (1) A human readable document (in a format such as PDF,HTML, etc.) that is delivered to the client physically, electronicallyover the Internet, or by fax, (2) A human-readable presentation on theclient portal, and/or (3) A data integration (over the internet orotherwise) with client computer systems (such as electronic medicalrecords, internal client portals, etc). In certain embodiments,generation of a clinical report may be triggered by many differentevents, including but not limited to (1) the completion of the analysispipeline (pipe) that triggers clinical report delivery to client orrequisitioning party, (2) the re-requisition of additional informationfrom client or requisitioning party, (3) a request for display on theclient portal, in real time or in batch mode, or (4) manually, asneeded. Clinical reports can be generated from the underlying reportdata in multiple forms for a single patient, such as sending ahuman-readable report in parallel with pushing the report data into aclient EMR system.

Requisition Engine

In certain embodiments, a portion of the requisition intake element 10can be provided as a requisition engine that is portable and can beintegrated in a computer system operated independently by a client,particularly where the client is a business entity. The requisitionengine can also be provided as a standalone component that may beembodied in a mobile device. The mobile device can include notebookcomputers, cellular telephones, including smart phones, tablet computersand customized device. Requisition engine may include or be coupled toan imaging device such as a scanner, a barcode reader, a camera or otherdevice capable of capturing images that include coded informationassociated with a patient, a sample, a location an order form or othertangible component. The requisition engine may include an input devicesuch as a keyboard, touch screen, pen interface, mouse, voicerecognition system and other input devices through which a client cancreate and complete a requisition electronically. The client can operatea requisition engine as an order management tool that interacts withsales and marketing channels to interact with end-users.

In certain embodiments, the client may define in a requisition one ormore of a marker(s), set of genome coordinates, variant(s), gene(s),condition(s), and/or group(s) of conditions to determine in part thetests to be run or additionally or alternatively the Requisition Enginesoftware may suggest one or more of a marker(s), set of genomecoordinates, variant(s), gene(s), condition(s), and/or group(s) ofconditions to be specified in the requisition so as to determine in partthe tests to be run based on the input of marker(s), set of genomecoordinates, variant(s), gene(s), condition(s), group(s) of conditions,symptoms, phenotypes, family history or other client specificinformation. Such client specific information may be entered manually,electronically, or pulled automatically from an EMR or other means ofelectronic requisition

The requisition and sample intake 10 receives test requisitions from theclient and relays the set of tests ordered by the client to the SIMS andidentifying information associated with the inbound sample. Uponcompletion of the analysis and transmission of the results to theclient, the requisition engine may be activated to notify the client ofa background or threshold notification where configured by the client.When a background or threshold notification is activated and the clientis informed, and automatic or manual “re-requisition” can be created toenable delivery of the remaining information. The requisition engine canalso be configured to track patients and version information of theproduction line 11 or elements of the production line 11 platform usedto perform tests on submitted samples. After development of new orupdated test procedures and processes, it is often preferable to retestsamples to obtain higher quality results and, where a new test isindicated, certain benefits accrue from using a new sample for testing.Consequently, process version information may indicate whether therequisitioning physician should submit a new sample in preference togenerating a re-requisition for data that already created during priortests.

In certain embodiments, a portion of the requisition intake element 10can be provided as a requisition engine that is portable and can beintegrated in a computer system operated independently by a client. Theclient can be an individual, a group of individuals and/or a businessentity. The requisition engine can be provided as a stand-alonecomponent that may be embodied in a mobile device. Suitable mobiledevices can include notebook computers, cellular telephones, includingsmart phones, tablet computers and customized device. In one example,input to the requisition engine depicted in FIG. 4 is received from atouch-enabled family history application that can be executed on atablet computer or other mobile device.

FIG. 5 depicts an example in which a client or customer uses arequisition engine to record family health history for an individual bydrawing on the application with a finger or stylus, and/or by usingicons provided by a client application on the touch-screen enableddevice. FIG. 6 depicts an example in which a client or customer canassign genetic status and condition and/or disease information forindividuals in the family history. The genetic and disease/conditioninformation collected by a requisition engine according to certainaspects of the invention can be associated with the patient and eitherdirectly submitted as a part of the requisition. The information canalso be routed to the client's EMR and be included with the requisitiongenerated from the EMR. The information can be printed out and attachedto paper/fax requisitions, as desired.

In practical use, the family history tool can be used to port diseasedata for all members of the family and can be directly transmitted tothe patients' EMR or PHR. Probabilities for carrier status, likelihoodof the disease state, and mode of inheritance can be calculated realtime as the clinician is entering the information. Genetic test data canalso be ported back to the family history tool, further enabling thedirect calculation of risk to an individual represented in the tool. Aswell as the calculation for inheritance of complex and/or multigenictraits. Sets of information, including those described herein, can alsoflow through to the EMR/PHR of other persons in the family history, i.e.other members of the family's medical records can be updated withinformation collected in other family members' medical history intake.The family history tool can inform the clinician on the full list ofdiseases and/or genes that can be ordered and can automatically includethat information in a requisition generated from within the tool. Thefamily history tool can also combine other data available to theclinician such as weight, cholesterol levels, blood chemistry,biomarkers, environmental factors, etc. to further refine disease riskestimates in the tool for specific diseases (i.e. the Gail model forbreast cancer, or other factors for Warfarin dosing).

In certain embodiments, the invention provides software and/or processesto limit analysis to variants, genes, markers and/or conditions thathave been requisitioned by the client or are determined to be applicableto the requisition. In certain of such embodiments the determination ofthe applicable variants, genes, markers and/or conditions to be analyzedmay be accomplished by requisitioning and analysis software which workstogether with the curation database to filter the data generated withrespect to the sample and limit the analysis to those aspects of patientdata (e.g., genetic sequence) which are relevant to the requisition asillustrated in one example in the flow-chart of FIG. 8. Data that isgenerated from a patient sample may include data which does not apply tothe particular requisition. This un-requisitioned data can be filteredout upstream of the risk engine which performs the analysis of the dataand remains un-analyzed as can be attributed to a patient.

Curation

The repository 17 and its constituent databases can be populated usingdata that is obtained from external sources. Certain information used topopulate the repository 17 may be found in one of thousands of publiclyavailable databases, test registries, private databases, academicjournals, and so on. Accordingly, certain embodiments comprise variouscuration tools that manage intake and maintenance of such informationand which can automate curation and annotation of information. Theprocess is typically supervised and reviewed by geneticists andepidemiologists. The curation tools may include database scraping tools,natural language processing tools, automated literature search tools,file maintenance tools, activity loggers, group editing tools, etc.

The process of curating clinical variants, their associated markers, andtheir relevance to various conditions and risk factors may, in certainembodiments, entail one or more of the following steps, which may beperformed in any order suitable to the process: (a) determining relevantgenes for a condition using a variety of sources, (b) determiningrelevant papers in the scientific literature, (c) reading the papers toassess outcome relevance, (d) mapping published variants to genomecoordinates, (e) determining if variant is pathogenic or questionablesignificance, (f) collecting all information needed to assesssignificance of a variant, (g) determining risk model of condition, (h)documenting and communication of primary curation, (i) quality check ofprimary curation, (j) condition loading. These steps may be performed indifferent orders from the above. The development of a curation databasecomprising a collection of pathogenic variants may be initiated and oraugmented by exploring conditions, genes, chromosome regions, orsymptoms and their associations with certain markers.

In certain embodiments, the invention provides for a method ofperforming the curation process with respect to a sample where suchmethod incorporates one or more of the following steps: (a) collectingall relevant pathogenic alleles in all relevant genes which will betested and analyzed such that they are determined before the sample istested, (b) determining the risk model for such novel pathogenic allelesin all relevant genes before the applicable samples are tested, (c)determining the standards for such novel pathogenic alleles in allrelevant genes before the applicable samples are tested, and (d)determining the haplotypes for such pathogenic alleles in all relevantgenes is determined before samples are tested. The order of steps(a)-(d) may be varied and or certain steps may be conducted concurrentlyor aspects of such steps may be performed in different orders withrespect to different pathogenic alleles. In certain embodiments,criteria which may be used for inclusion or exclusion of information ina curation database may be applied consistent with one or more of thefollowing principals. A pathogenic variant for a condition with aqualitative risk model is an allele that is observed in an affectedindividual and results in an alteration (deletion, insertion, basechange) of a consensus splice acceptor or donor site, alters theinitiation codon, introduces a stop codon, results in a frameshift ofthe protein, results in a missing exon and can be mapped unambiguouslyto the human genome OR is an allele that is observed in an affectedindividual and has been demonstrated to have a functional effect in anexperimental assay. A questionable variant for a condition with aqualitative risk model is an allele that is observed in an affectedindividual and results in a missense protein change, silent proteinchange, an inframe deletion or insertion, removal of a stop codon,promoter change, intronic change outside of the consensus splice sitesand can be mapped unambiguously to the human genome and has not beendemonstrated to have a functional effect in an experimental assay. Avariant can be mapped unambiguously to the human genome when two piecesof independent mapping evidence are available. Two of the followingtypes of evidence are required amino acid change, nucleotide positionand change, sequence trace, another identifier (like an rsID or HGVSannotation), alignment or other piece of information. A pathogenicvariant for a condition with a quantitative risk model is an allele thathas been shown to meet genome-wide statistical significance in one studyand has been replicated (at statistical evidence) in another independentstudy and can be mapped unambiguously to the human genome.

In certain embodiments, the invention provides for a curation databasecomprising risk models assigned to a marker, variant, location, gene orcondition. Features of risk models may include some of the followingcharacteristics. A risk model describes the phenotype produced for aparticular genotype at a locus or set of loci. Risk models may bequantitative or qualitative. A condition may have more than one riskmodel. Risk models may incorporate alleles, genotypes, haplotypes, ordiplotypes at one loci or a set of loci in any combination. Qualitativerisk models may use non-descript words such as “risk or not at risk” ormay be specific to the outcome “at risk for lower lung function”.Quantitative risk models may use odds-ratios, hazard-ratios,relative-risks or other quantitative measurements of an analyte.

In certain embodiments the invention provides for various tools tofacilitate the curation process including the development of content inthe curation database. Such tools which may be used in the process ofperforming the curation process or aspects thereof include the followingtools: (a) a curation tool that allows the user to enter the amino acidchange in one letter or three letter code and position and returns thenucleotide position (relative to the coding or RNA sequence) of aGenBank accession number, (b) a curation tool that allows a user toentire the HGVS-formatted position of a NBCI reference sequence with orwithout the nucleotide changes and returns the genomic coordinates(chromosome and chromosome start and stop positions), (c) a curationtool that allows a user to entire the HGVS-formatted position of a NBCIreference sequence with or without the nucleotide changes and returnsthe reference allele present in a reference genome, (d) a curation toolthat allows a user to visualize different coordinate systems (RNA, DNA,protein) at different zoom levels, (e) a curation tool that allows auser to track relevant publications (identifiers, publication,supplements) by condition, (f) a curation tool that reverse complementsnucleotide changes when the user enters the strand information for agene and the variant alleles, (g) a curation tool that generatesdatabase entry forms from curator excel worksheets, (h) a curation toolthat checks for consistency between different curated values, (i) acuration tool that predicts functional protein changes from nucleotidechanges and an accession number (RNA or DNA), (j) a set of templates forentry into a form, software program, database, or other electronic media(see template examples).

Use of curation tools may be accomplished by any means of accessing thecuration database and information that may be entered into the databasemay be done manually, through curation tools or through portal 18. Suchinformation that may be included in the curation database may includeany or all of the following set of values for each variant that may beneeded to go from sequence to report such as (a) mutation type, (b)position information (including specification of any or all of (i)chromosome, chromosomal position relative to genome build, (ii) positionrelative to DNA accession, (iii) position relative to RNA accession),(c) observed alleles, (d) observed haplotypes, (e) haplotype phenotypes,(f) reference source for haplotype phenotypes, (g) variant aliases, (h)condition name, (i) risk model(s) appropriate for condition, (j)clinical subtypes included with condition, (k) relevant gender forcondition, (l) epidemiological data which may be relevant for condition(including any or all of the following as applicable: (i) ancestry, (ii)age, (iii) lifetime risk, (iv) prevalence, and/or (v) incidence), (m)for each quantitative allele/genotype/haplotype/diplotype, asapplicable, the odds-ratio (or other quantitative metric), relevantancestry, reference, and or frequency, (n) for each qualitativeallele/genotype/haplotype/diplotype, as applicable, the risk allele, thenonrisk allele, the unknown allele, the reduced penetrance allele, thefull penetrance allele, and or the mutable allele, (o) the applicablerisk model type, i.e. whether it is quantitative, qualitative, ormulti-locus quantitative, or multi-locus qualitative.

In certain embodiments, the curation process may include steps ofdetermining condition groups. A condition group is a set of relatedconditions that should be tested and reported together. For example, acondition group could be the CFTR condition group comprising thefollowing conditions: Cystic fibrosis, Congenital bilateral absence ofthe vas deferens, Cystic fibrosis (modifier MBL2), Cystic fibrosis(modifier TGFB1), modifier of CFTR related conditions.

In certain embodiments, a collaborator portal 18 can serve up data fromexternal researchers and experts. For example, experts in a specificdisease can upload relevant curation data and/or can perform curationusing the curation tools available to them through the portal, althoughit will be appreciated that the experts can use their own tools, whichmay be configured to automatically deposit relevant information into therepository. Such curation tools can include curation tools as describedherein as well other tools suitable for curation processes.

Locus Database

In certain embodiments, repository 17 comprises a database whichmaintains information that enables assay design for sequencing of lociof interest (the “Locus Database”). The Locus Database may additionallymaintain clinical information that can be accessed by an analysis engineused for interpretation of test results. Typically, the Locus Databaseis continually updated and updates are tracked using version numbers.Each requisitioned sample processed and tracked by the SIMS can bemarked or “stamped” with information indicating the version of the LocusDatabase (and other system components) in use when the test wasperformed. The Locus Database may inform the SIMS for quality control ofsequencing data by checking against reference, allele frequencies,molecular fingerprinting, etc. In addition, the Locus Database canmaintain any relevant information for submission to state and federalregulatory authorities in validation packets or other correspondence.The Locus Database can be populated using the curation tools, manualentry and from information entered through portal 18.

The fields for the Locus Database may include any or all of thefollowing set of values for each variant that may be needed to go fromsequence to report such as (a) mutation type, (b) position information(including specification of any or all of (i) chromosome, chromosomalposition relative to genome build, (ii) position relative to DNAaccession, (iii) position relative to RNA accession), (c) observedalleles, (d) observed haplotypes, (e) haplotype phenotypes, (f)reference source for haplotype phenotypes, (g) variant aliases, (h)condition name, (i) risk model(s) appropriate for condition, (j)clinical subtypes included with condition, (k) relevant gender forcondition, (l) epidemiological data which may be relevant for condition(including any or all of the following as applicable: (i) ancestry, (ii)age, (iii) lifetime risk, (iv) prevalence, and/or (v) incidence), (m)for each quantitative allele/genotype/haplotype/diplotype, asapplicable, the odds-ratio (or other quantitative metric), relevantancestry, reference, and or frequency, (n) for each qualitativeallele/genotype/haplotype/diplotype, as applicable, the risk allele, thenonrisk allele, the unknown allele, the reduced penetrance allele, thefull penetrance allele, and or the mutable allele, (o) the applicablerisk model type, i.e. whether it is quantitative, qualitative, ormulti-locus quantitative, or multi-locus qualitative.

The Locus Database may additionally identify condition groups as sets ofrelated conditions that should be tested and reported together. Forexample, a condition group could be the CFTR condition group comprisingthe following conditions: Cystic fibrosis, Congenital bilateral absenceof the vas deferens, Cystic fibrosis (modifier MBL2), Cystic fibrosis(modifier TGFB1), modifier of CFTR related conditions.

In certain embodiments, the information may be stored in the database soas to combine odds ratios for certain variants, markers, genes and/orconditions. In certain such embodiments, information in the databaserelated to individualized combined odds ratios may demonstrate some orall of the following features: (a) method of combining published oddsratios for independent variants into one risk score, (b) incorporationof the age of patient (and how the associated risk changes over time),(c) incorporation of ancestry of patient, (d) incorporation measures ofuncertainty (confidence intervals), (e) use of logistic regressionmodels, (f) easy extendablity to non-genetic factors, (g) calculation ofwhere an individual stands in relation to a reference population bygenerating simulation for all possible risk combinations, based onhapmap allele frequencies if available, and if not, based on publishedfrequencies.

The database representation of genomic features permits, in certainembodiments of the invention, the computational manipulation,comparison, and conveyance of instances of variation as a single classof objects rather than as disjoint classes. In the course of abstractingtypes of biological variation into a single variation type, biologicalvariation types may include, but are not limited to: (a) singlenucleotide variants, (b) multi-nucleotide variants, (c) haplotypes, (d)insertions and conversions, (e) deletions and fusions, (f) duplications,(g) copy number variations, (h) inversions, and or (i) translocations.Associated data may be linked to one or more abstract variant types.Such associated data may include, but is not limited to: (i) references,(ii) provenance, (iii) comments, (iv) risk models, (v) phenotypes(i.b.n.l.t), (vi) drug response, and or (vii) morphology.

Assay Development

In certain embodiments, the sequencing step 214 (see FIG. 2) includes astep of amplifying 208 loci of interest upstream of the sequencingplatform. The Locus Database may be used to inform the amplificationstage of the positional information of each marker and a series ofprimer sets can be designed using automated tooling. These primer setscan optimize coverage of all regions at highest multiplexy possible. Inone example, multiplexy levels can be in the range of between 10-10,000amplicons per well. Redundant coverage may be designed for any givenlocus of interest so that natural, and largely unknown, variations inthe genome typically will not disrupt the targeted amplification to adegree that sequences are not read for any of the target regions. Aftertargeted amplification, the amplicons for each patient are barcoded witha sequence such as delineated post-sequencing. The barcoded sequence canbe used to assign the “sequence species” to each patient by the analysisengine.

In certain embodiments, the invention provides an assays database whichis a data repository for managing information regarding molecularassays. Molecular assays are molecular reagents or designs for producingmolecular reagents which are designed to assay molecular attributes,such as genetic variation, which are associated with phenotypic state,such as the presence of inherited disease. Examples of molecular assaysfor genetic variation include oligo pairs for PCR amplification, oligosequences for hybridization-based DNA capture, and reagents forwhole-genome shotgun sequencing. The information which is tracked withmolecular assays for genetic variation can include any or all of thefollowing: (a) the identity of any nucleic acid sequences involved inthe reagent, (b) the type of reagent (PCR, hybridization capture bait,etc. . . . ), (c) the region of the genome which is assayed by thereagent, (d) the known genomic variants which are assayed by thisreagent, (e) the type of variants which are assayable with this reagent,(f) the genetic conditions which are associated with these variants, (g)quality assurance information (e.g., the pass/fail state of the assayfor use in a molecular diagnostic process and or metrics regarding theexpected quality (coverage, expected sensitivity, specificity) of theassay), (h) information associated with the genomic region which informsthe relative success of the assay such as the G+C % content, thepresence of low-complexity repeats, and the presence of similar regionsin the reference sequence.

Sample Information Management System

After intake and accessioning, a sample and its associated informationassociated can be tracked through the testing process and until reportedto the requisitioning client. The sample information management systemor SIMS is informed by the requisition engine 10 regarding identity ofthe patient and/or client from whom the sample is obtained and the setof requisitioned tests for the sample. The SIMS can perform in-testanalysis and quality checks as well as raw sequence quality checksbefore the data is sent to the Analysis Engine. In addition, the SIMScan perform functions that a conventional laboratory informationmanagement system (“LIMS”) would typically perform. The SIMS may alsomaintain operator data and can perform statistical analyses tocontinually improve the in-process testing and quality control. Incertain embodiments, the sample information management system is asoftware system for the management of the entire sample life-cycle alongwith associated primary data (e.g., dna sequence reads, instrument datafiles, lab personnel observations), secondary data (e.g., qualitycontrol (“QC”) metrics, statistical data, etc.), and process data (e.g.,workflow QC, sample quality metrics, automation consistency).

In certain embodiments, the sample information management system has thefollowing primary functions: (a) integration with the upstreamrequisitioning system to allow for eventual de-anonymization when samplereports are ready to be reported to a client, (b) tracking of all statesand events associated with the processing of patient sample(s) as wellas providing instructions and protocols to laboratory personnel in amanner compliant with state and federal regulations, including as theyrelate to normal operations, generation of deviation reports, andrelated information, (c) controlling the processing work-flow, includingQC controls, failure modes, and failure recovery, (d) tracking all datagenerated during sample processing, including both primary and secondaryprocessing as produced by lab personnel, external sources, labequipment, and related sources, (e) tracking all reagents used inwork-flow, the batches/lots they belong to, the vendors that they wereprocured from, vendor QC metrics, and related information, (f) trackingall physical machines, personnel, data access, involved in thework-flow, such as QC engineers, lab technicians, genetic sequencers,thermal cyclers, and related equipment, (g) integration with reportgeneration system and Requisition Engine, and/or (h) allowingretrospective QC.

In certain embodiments the SIMS is configurable and can accommodatemultiple sample-processing work-flows, sample transformation such assplit/pool/merge/transfer aliquots, sample multiplexing, etc., alongwith all associated data (primary, secondary & process) generated atevery step of each work-flow. The SIMS can act as a repository for allgenerated data, primary, secondary, and process data. This data may bestored directly in the SIMS (database, file system, or any mixture ofthereof) or may be stored externally (in databases, file systems,specialized data stores, or any mixture thereof). Data is entered intothe SIMS through automatically or manually sending and receiving data toand from individual machines involved in work-flow by multiple meanssuch as over the local computer network, over the Internet, via handhelddevices, as well as data entry through the SIMS user interface, with orwithout assistance from external devices such as bar-code scanners, etc.

In certain embodiments, the SIMS can provide the application ofcontinuous and statistical process monitoring to improve the quality androbustness of molecular genetics testing. Some of the process elementswhich can be monitored include: (i) switched samples, (ii) samplecontamination, (iii) reagent lot quality, and (iv) analytical instrumentquality. To detect the presence of switched samples (e.g.,mis-identified laboratory samples) one or more of the following methodsteps may be performed. A gender check process may be performed for asample by performing molecular assays which assay the presence orabsence and copy number of the sex chromosomes (X and Y). The moleculargender (XX, XY, XXY, etc. . . . ) is compared with the reported genderfor that sample. If a discordance in gender is detected, such eventsuggests a possible indicator that the samples were misidentified andoptionally the sample is further checked or the analysis is repeatedwith a fresh sample. A genetic “fingerprint” step can be performed tocheck for the possibility of a switched sample by (i) reserving aportion of a sample (the reserved sample) for generating a genotypicfingerprint associated with the sample, (ii) running molecular assaysfor the sample which are designed to detect genomic variants which arepresent at a high frequency in multiple human sub-populations (incertain embodiments approximately 20 or more such loci are assayed) suchcollection of results providing a genetic fingerprint for such sample,(iii) verifying the correct labeling of the sample by generating theapplicable genetic fingerprint for the reserved sample, preferably usingan orthologous method for assaying genotype such as real-time PCR,high-throughput genotyping arrays, or capillary electrophoresissequencing, and (iv) comparing the genetic fingerprint for the sampleand the reserved sample such that a discordance in the two geneticfingerprints suggests the occurrence of a misidentified sample andoptionally a fresh sample is obtained to replace or confirm the analysisof the original sample.

In certain embodiments, the invention provides methods for detection ofsample contamination (e.g., assay results resulting from non-puresamples), which methods may include one or more of the followingapproaches. In certain embodiments, a genetic fingerprint as describedherein is measured with respect to a certain number of high-frequencyvariants) can be used to detect cross-sample contamination. For germlineDNA testing alleles can be expected to be present in 2:0, 1:1, or 0:2ratios. The presence of variants at non-standard ratios (esp.low-frequency ratios) can indicate a possibly non-pure sample. Incertain embodiments, the invention provides methods of detectingpossible sample contamination comprising the steps of generating agenetic fingerprint for a sample, checking for the presence ofnon-standard allele ratios (i.e. ratios other than approximately 2:0,1:1, or 0:2) and determining that sample contamination may have occurredif a non-standard ratio is detected.

In certain embodiments, the invention provides for a method of detectingsample contamination comprising the steps of (i) ligating DNA barcodesto both ends of the starting DNA fragments derived from a sample, and(ii) reading the DNA barcodes in the sequencing process, and (iii)checking for incorrect barcodes as an indicator of possible samplemisidentification.

In certain embodiments, the invention provides methods for theassessment of total process quality comprising one or more of thefollowing steps. Control samples may be used to test overall processquality. To verify the proper functioning of the reagents, processes,and instruments involved in the diagnostic workflow, periodicallyassaying control samples from well-characterized molecular sources (forexample, a well-characterized molecular source for genomic DNA is DNAfrom the Coriell cell repository) to verify process consistency andnoting possible defects in process quality in the event that an assay ofa control source yields a result different from its reference value.Continuous monitoring-process metrics may be tracked, continuously orepisodically, and observed to verify operational consistency. Examplesof such metrics include the following: (i) the amount of DNA after thefragmentation step, (ii) the distribution of DNA fragment lengths, (iii)the number of aligned sequencing reads per sample, (iv) the averagereadlength of sequencing reads, (v) average sequencing quality (Phredscore, observed error), (vi) the total number of aligned bases, (vii)the number of variants called, and or (viii) the ratio of transversionvs. transition in variants. In certain embodiments, the inventionprovides the use of one or more of these quality monitoring steps andthe comparison of the results of such quality monitoring step(s) withthe applicable reference standard or expected range of values for suchquality metrics. Observations of quality metrics may be collected,analyzed, presented and or observed using any of (i) tables of the mean,variance, median, mode, or other statistical characterizations vs timeso as to show changes in such values over time, (ii) box plots, (iii)control charts, (iv) correlations of variations in metrics with changesin other elements of the process such as changes in reagent lots,instrument maintenance, lab personnel changes, environmental effects,etc.

Analysis Engine

Certain embodiments comprise an Analysis Engine that combines hardwareand software components that read and interact with sequence informationresulting form sample testing. Analysis Engine can comprise instruments,processors and interfaces. Analysis Engine typically receives rawsequences and sorts and identifies “sequence species” for the patientbased on the barcode sequence. Next loci of interest can be queriedagainst a locus database according to instructions provided byrequisition engine 10. Clinical results can be processed and ported tothe data sort and cross-reference component 12. Data may be transmittedto the repository 17 for storage in one or more databases. The data istypically rendered anonymous prior to transmission to the repository 17.Other clinical/patient information may be transmitted to the repository,as indicated by requisition engine 10 or based on requests from thecollaborator portal 18.

In certain embodiments, the Analysis Engine may be used to analyzeinformation so as to combine odds ratios for certain variants, markers,genes and/or conditions. In certain such embodiments, informationrelated to the analysis of individualized combined odds ratios maydemonstrate some or all of the following features: (a) method ofcombining published odds ratios for independent variants into one riskscore, (b) incorporation of the age of patient (and how the associatedrisk changes over time), (c) incorporation of ancestry of patient, (d)incorporation measures of uncertainty (confidence intervals), (e) use oflogistic regression models, (f) easy extendablity to non-geneticfactors, (g) calculation of where an individual stands in relation to areference population by generating simulation for all possible riskcombinations, based on hapmap allele frequencies if available, and ifnot, based on published frequencies.

In certain embodiments of the invention, analysis is conducted from acombination of data sources such as data from one or more laboratoryworkflows to derive attributes which are associated with phenotypicstate, for example as illustrated in FIG. 9. Such analysis form multipleworkflows may involve analysis from a combination of tissue/sample typessuch as one or more of the following: (i) germline genetic variantsderived from analyzing genomic and/or mitochondrial DNA derived from oneor more of normal tissue, blood samples, saliva samples, other sources,or genetic variants obtained from various other platforms (e.g.,real-time PCR, array hybridization, fragment analysis methods, massspec, DNA sequencing), (ii) somatic genetic variants derived fromanalyzing genomic and/or mitochondrial DNA derived from one or more ofnormal tissue, neoplastic (tumor) cells, or genetic variants obtainedfrom various other platforms (e.g., real-time PCR, array hybridization,fragment analysis methods, mass spec, DNA sequencing), or (iii) germlinegenetic variants derived from analyzing coding DNA (cDNA) derived fromone or more of normal tissue, blood samples, saliva samples, othersources, or genetic variants obtained from various other platforms(e.g., real-time PCR, array hybridization, fragment analysis methods,mass spec, DNA sequencing), (iv) somatic genetic variants derived fromanalyzing coding DNA (cDNA) derived from one or more of normal tissue,neoplastic (tumor) cells, or genetic variants obtained from variousother platforms (e.g., real-time PCR, array hybridization, fragmentanalysis methods, mass spec, DNA sequencing), (v) gene expression levelsderived from analyzing mRNA or cDNA derived from one or more of normaltissue, neoplastic (tumor) cells, or gene expression levels obtainedfrom various other platforms (e.g., real-time PCR, array hybridization,fragment analysis methods, mass spec, DNA sequencing).

Data Marshalling

Certain elements comprise a data management element 12 that receivestest results and that sorts and cross-references the tests results. Dataelement 12 can maintain clinically relevant information associated withrequisitioned conditions/markers that are then served up to the resultsdelivery element 13. Data element 12 can also hold “keys” toun-interpreted data that can be called on by the requisition engine 10.Responsive to a subsequent request forwarded by requisition engine 10,data element 12 can cause the Analysis Engine to interpret theappropriate sequence, thereby producing additional results that can beprovided to the client.

Repository

In certain embodiments, repository 17 comprises one or more databasesthat store the raw data obtained from the test and analysis processes,as well as clinical data from the Data Menu. Clinical data is typicallyrendered anonymous. Repository database can typically be accessed byoutside collaborators, according to their distinct access rights,through portal 18. In some embodiments, portions of the information inthe Repository databases may be visible to the general public through amodule in the portal 18. In certain embodiments of the invention, therepository may contain some or all of the following: (i) Sample-deriveddata including (x) attribute data (such as variant calls and expressionlevels) produced using the methods of the invention described herein andor produced by a third party and (y) genomic variants and sequence datanot limited to known loci associated with genetic conditions; (ii)sample metadata including ethnicity, age, gender, geographical origin,risk behaviors, and or environmental influences; (iii) sample-associatedphenotype information such as disease status, qualitative traitassessments (e.g., description of symptoms), quantitative traitassessments (e.g., blood pressure, lung capacity, blood marker levels),and or current Rx prescriptions; (iv) study metadata such as informationassociated with the client or collaborator producing the samples, and orstudy name, purpose, and experimental design.

In certain embodiments, the repository permits mining of one or more ofits databases for novel associations between genetic variation or otherattributes and phenotypes. Algorithms suitable for data-mining therepository may include, but are not limited to the following: linearregression, logistic regression, classification trees, hierarchicalclustering, k-means clustering, Bayesian networks, neural networks, andsupport vector machines.

In certain embodiments, the repository permits the ability to associatemultiple attributes, across attribute types, with a phenotypiclikelihood score. In certain embodiments, the repository permits theability to condition this score based on sample metadata such as age orethnicity and the repository may provide for the graphical descriptionof this score using methods such as box-and-whisker plots or histograms.The lack of an association between a phenotype and certain attributescan be used as information that a given phenotype might be more complexor less directly linked to molecular signatures of the associatedattributes.

In certain embodiments, the repository permits the ability to mine thedatabase for quality control purposes such as quality control inconnection with the clinical diagnostic process. Associations present inthe repository database can be tested against known associations fromthe curation database to check the veracity or either the curationdatabase or the results presenting such associations. In certainembodiments, statistical measures of the data are compared withcomparable studies and cohorts to verify the integrity of the datageneration process.

In certain embodiments the repository is configured to permit access bythird party researchers where access to the repository is limited tosamples and information which has been approved for research use. Accessto the repository and its contents may possess different levels ofaccess and control for different users. In certain embodiments, levelsof access and control can be restricted for certain populations of usersof the repository to one or more of the following categories: (i)genomic locations only, (ii) genomic locations and alleles only, (iii)genomic locations, alleles and associated risk models, (iv) datarestricted by conditions (for example, CF only), (v) data restricted bycategory of conditions (for example, rare Mendelian disorders only),(vi) data restricted by legal constraints (for example, non-patentedgene tests only), and or (vii) data restricted by arbitrary sets asdefined by Locus or third parties.

Data in the repository may be restricted to authorized clients asdefined by contractual agreements. For example, some clients may bepermitted to view all data which has been made “available for researchpurposes”. The repository can be configured so that authorizedindividuals can have access to all of the data. The data repository canbe configured to not store any identifying information for samples (e.g.patient names or dates of birth), as needed by regulations and thedesired or required level of privacy. Access and control with respect tothe repository or parts thereof may also be configured to enabledifferent levels of access and control depending upon the identity ofthe client and the role of persons within the client (e.g. principalinvestigators, technicians, etc.). Role-based access to the repositorymay include controlled access to any or all of functions such as (i)searching the repository, (ii) types of information returned (loci only,assays and loci, condition information only), (iii) adding items to thedata repository, and or (iv) deleting items from the data repository,(v) modifying items in the data repository. Users of the repository mayhave their level of access or control determined by their designatedrole and the level of access and control associated with such role. Auser may be associated with an account profile (e.g., comprising username, password, institution, contact information, access levels, etc.)which may be stored with the repository and define the levels of accesspermitted for that user. Levels of access and control with respect todatabases of the invention including the repository may also bedesignated with respect to automated access performed by softwareagents.

Delivery

In certain embodiments, delivery module 13 processes clinical data foreach condition/marker received from the data module 12. Clinical dataand results can be formatted according to client preferences, indicatedby the Requisition Engine 10. For example, different clients may want toutilize different markers for the same condition, or test more or fewergenes for a condition of interest. While the information in the dataelement 12 is comprehensive, the information served in the deliverymodule 13 is typically tailored to client specifications.

Client Data Integration

In certain embodiments, a Lab Report is delivered in a mannerpre-determined for each client. Software and hardware combinations areprovided that enable the information from the delivery element 13 to betransferred to the client's EMR system, to a physician/patient portal,to a company portal, and/or to a fax processing system. Certain elementsof the system may be directly integrated into client systems. Includedin the integration may be background or threshold notification handlersand a module for supporting client re-requisitions.

In certain embodiments of the invention, the invention permits a clientwho wishes to contact an expert in the field related to a test result oranalysis they have received from the delivery element 13 and/or areaccessing through the Collaborator Portal element 18. Through theCollaborator Portal and/or any other software or information service theclient may be connected with an expert based on the marker(s), set ofgenome coordinates, variant(s), gene(s), condition(s), and/or group(s)of conditions of interest

Collaborator Portal

As noted above, certain embodiments provide collaborators with access todata in the repository 17. Collaborators can access data from the LocusDatabase or other databases, including repository databases controlledby certain specific permissions and privileges. Certain collaboratorscan be provided access to upload curation information into the LocusDatabase supplemental to curation tools embodied in the system. In someembodiments, certain specific condition/marker information can be madeavailable for public viewing, typically in support of ongoing research.In certain embodiments selective access to the curation database may begranted to experts in the field, collaborators or clients to participatein the curation process by uploading, entering, suggesting or commentingon variants, genes, markers and/or conditions through software tools ora ‘wiki’ in the portal or otherwise made available. Additionally, accessto aspects of certain curation databases may be granted to experts inthe field, collaborators or clients to comment, rank, prioritize orotherwise give input on information in the database through softwaretools or a ‘wiki’ in the portal or otherwise made available.

System Description

Turning now to FIG. 3, certain embodiments of the invention employ aprocessing system that includes at least one computing system 300deployed to perform certain of the steps described above. Computingsystems may be a commercially available system that executescommercially available operating systems such as Microsoft Windows®,UNIX or a variant thereof, Linux, a real time operating system and or aproprietary operating system. The architecture of the computing systemmay be adapted, configured and/or designed for integration in theprocessing system, for embedding in one or more of an image capturesystem, a manufacturing/machining system and/or a graphics processingworkstation. In one example, computing system 300 comprises a bus 302and/or other mechanisms for communicating between processors, whetherthose processors are integral to the computing system 30 (e.g. 304, 305)or located in different, perhaps physically separated computing systems300. Device drivers 303 may provide output signals used to controlinternal and external components

Computing system 300 also typically comprises memory 306 that mayinclude one or more of random access memory (“RAM”), static memory,cache, flash memory and any other suitable type of storage device thatcan be coupled to bus 302. Memory 306 can be used for storinginstructions and data that can cause one or more of processors 304 and305 to perform a desired process. Main memory 306 may be used forstoring transient and/or temporary data such as variables andintermediate information generated and/or used during execution of theinstructions by processor 304 or 305. Computing system 300 alsotypically comprises non-volatile storage such as read only memory(“ROM”) 308, flash memory, memory cards or the like; non-volatilestorage may be connected to the bus 302, but may equally be connectedusing a high-speed universal serial bus (USB), Firewire or other suchbus that is coupled to bus 302. Non-volatile storage can be used forstoring configuration, and other information, including instructionsexecuted by processors 304 and/or 305. Non-volatile storage may alsoinclude mass storage device 310, such as a magnetic disk, optical disk,flash disk that may be directly or indirectly coupled to bus 302 andused for storing instructions to be executed by processors 304 and/or305, as well as other information.

Computing system 300 may provide an output for a display system 312,such as an LCD flat panel display, including touch panel displays,electroluminescent display, plasma display, cathode ray tube or otherdisplay device that can be configured and adapted to receive and displayinformation to a user of computing system 300. Typically, device drivers303 can include a display driver, graphics adapter and/or other modulesthat maintain a digital representation of a display and convert thedigital representation to a signal for driving a display system 312.Display system 312 may also include logic and software to generate adisplay from a signal provided by system 300. In that regard, display312 may be provided as a remote terminal or in a session on a differentcomputing system 300. An input device 314 is generally provided locallyor through a remote system and typically provides for alphanumeric inputas well as cursor control 316 input, such as a mouse, a trackball, etc.It will be appreciated that input and output can be provided to awireless device such as a PDA, a tablet computer or other systemsuitable equipped to display the images and provide user input.

According to one embodiment of the invention, Processor 304 executes oneor more sequences of instructions. For example, such instructions may bestored in main memory 306, having been received from a computer-readablemedium such as storage device 310. Execution of the sequences ofinstructions contained in main memory 306 causes processor 304 toperform process steps according to certain aspects of the invention. Incertain embodiments, functionality may be provided by embedded computingsystems that perform specific functions wherein the embedded systemsemploy a customized combination of hardware and software to perform aset of predefined tasks. Thus, embodiments of the invention are notlimited to any specific combination of hardware circuitry and software.

The term “computer-readable medium” is used to define any medium thatcan store and provide instructions and other data to processor 304and/or 305, particularly where the instructions are to be executed byprocessor 304 and/or 305 and/or other peripheral of the processingsystem. Such medium can include non-volatile storage, volatile storageand transmission media. Non-volatile storage may be embodied on mediasuch as optical or magnetic disks, including DVD, CD-ROM and BluRay.Storage may be provided locally and in physical proximity to processors304 and 305 or remotely, typically by use of network connection.Non-volatile storage may be removable from computing system 304, as inthe example of BluRay, DVD or CD storage or memory cards or sticks thatcan be easily connected or disconnected from a computer using a standardinterface, including USB, etc. Thus, computer-readable media can includefloppy disks, flexible disks, hard disks, magnetic tape, any othermagnetic medium, CD-ROMs, DVDs, BluRay, any other optical medium, punchcards, paper tape, any other physical medium with patterns of holes,RAM, PROM, EPROM, FLASH/EEPROM, any other memory chip or cartridge, orany other medium from which a computer can read.

Transmission media can be used to connect elements of the processingsystem and/or components of computing system 300. Such media can includetwisted pair wiring, coaxial cables, copper wire and fiber optics.Transmission media can also include wireless media such as radio,acoustic and light waves. In particular radio frequency (RF), fiberoptic and infrared (IR) data communications may be used.

Various forms of computer readable media may participate in providinginstructions and data for execution by processor 304 and/or 305. Forexample, the instructions may initially be retrieved from a magneticdisk of a remote computer and transmitted over a network or modem tocomputing system 300. The instructions may optionally be stored in adifferent storage or a different part of storage prior to or duringexecution.

Computing system 300 may include a communication interface 318 thatprovides two-way data communication over a network 320 that can includea local network 322, a wide area network or some combination of the two.For example, an integrated services digital network (ISDN) may used incombination with a local area network (LAN). In another example, a LANmay include a wireless link. Network link 320 typically provides datacommunication through one or more networks to other data devices. Forexample, network link 320 may provide a connection through local network322 to a host computer 324 or to a wide are network such as the Internet328. Local network 322 and Internet 328 may both use electrical,electromagnetic or optical signals that carry digital data streams.

Computing system 300 can use one or more networks to send messages anddata, including program code and other information. In the Internetexample, a server 330 might transmit a requested code for an applicationprogram through Internet 328 and may receive in response a downloadedapplication that provides for the anatomical delineation described inthe examples above. The received code may be executed by processor 304and/or 305.

ADDITIONAL DESCRIPTIONS OF CERTAIN ASPECTS OF THE INVENTION

The foregoing descriptions of the invention are intended to beillustrative and not limiting. For example, those skilled in the artwill appreciate that the invention can be practiced with variouscombinations of the functionalities and capabilities described above,and can include fewer or additional components than described above.Certain additional aspects and features of the invention are further setforth below, and can be obtained using the functionalities andcomponents described in more detail above, as will be appreciated bythose skilled in the art after being taught by the present disclosure.

Certain embodiments of the invention provide systems and methods foranalyzing diagnostic information. Certain of these embodiments extractgenomic DNA from a sample. Certain of these embodiments obtain sequencedata for a plurality of markers. In certain embodiments, the pluralityof markers include markers associated with one or more tests identifiedin a requisition corresponding to the sample. Certain of theseembodiments generate a response to the requisition. In certainembodiments, the response is limited to an analysis of sequence datacorresponding to the one or more tests. Certain of these embodimentsassign an identifier to the sequence data. In certain embodiments, theidentifier identifies the sample, the requisition and informationrelated to the analysis. Certain of these embodiments store the sequencedata in a repository of data. In certain embodiments, the repositoryprovides data for analysis. Certain of these embodiments reportadditional results of an analysis of the sequence data in response to asecond requisition. In certain embodiments, the second requisitionidentifies a test different from the one or more tests.

In certain embodiments, assigning an identifier includes rendering thesource of the sample and the requisition anonymous. In certainembodiments, the second requisition includes the identifier. In certainembodiments, the information related to the analysis identifies the dateof the analysis. In certain embodiments, generating a response to therequisition includes performing quality control on the results of ananalysis of the sequence data. In certain embodiments, performingquality control includes performing a comparison using qualityinformation derived from the repository. In certain embodiments,performing quality control includes updating the quality informationusing the results of the analysis of the sequence data. In certainembodiments, the repository is accessible to contributors through aportal. Certain of these embodiments update the repository based oncontributions made by the contributors.

In certain embodiments of the invention, systems comprise a sampleprocessing production line. In certain embodiments, the production lineincludes a genomic DNA extractor configured to extract DNA from abiological sample. In certain embodiments, the production line includesa target amplifier configured to amplify components of the extractedDNA. In certain embodiments, the production line includes a sequencerthat produces sequence data for a plurality of markers from theamplified components. In certain embodiments, the plurality of markersincludes markers associated with one or more tests identified in arequisition received with the sample. Certain of these embodimentscomprise a sample information management system (SIMS) that controlsprocessing of the sample by processing production line and analysis ofthe results of the processing of the sample. Certain of theseembodiments comprise a quality control (QC) database that provides theSIMS with QC information. In certain embodiments, the SIMS uses the QCinformation to validate the processing of the sample and the analysis ofthe results. Certain of these embodiments comprise a repositorycomprising one or more databases. In certain embodiments, the repositoryaggregates the results generated by processing a plurality of samples.In certain embodiments, the repository includes the quality controldatabase and a research database. Certain of these embodiments comprisean analyzer that generates the results using information in therepository.

In certain embodiments, information identifying a source of the sampleis removed from the sample, the requisition and the results. In certainembodiments, the SIMS controls the processing and analysis of the systemusing a unique identifier assigned to the sample, the requisition andthe results. In certain embodiments, a subset of the results aredelivered to the source of the sample. In certain embodiments, thesubset of results corresponds to a set of tests identified in therequisition. In certain embodiments, the subset of the results andadditional results are maintained in the repository. In certainembodiments, the additional results are aggregated in the researchdatabase.

Certain of these embodiments comprise a portal that selectively providesaccess to data in the research database to a plurality of contributors.In certain embodiments, the portal communicates with the plurality ofcontributors via a public network. In certain embodiments, certain ofthe contributors provide additional research data to the researchdatabase. Certain of these embodiments comprise a data curatorconfigured for use by the plurality of contributors. Certain of theseembodiments comprise a data curator configured to process informationprovided to the research database. In certain embodiments, theinformation provided to the research database is obtained from publicsources.

Certain embodiments of the invention include methods performed on acomputer system that controls production line operations, analyzesphysical results of the process, manages databases and/or gateways andportals or that controls intake of physical samples.

Although the present invention has been described with reference tospecific exemplary embodiments, it will be evident to one of ordinaryskill in the art that various modifications and changes may be made tothese embodiments without departing from the broader spirit and scope ofthe invention. Accordingly, the specification and drawings are to beregarded in an illustrative rather than a restrictive sense.

Certain aspects of the invention are further illustrated by thefollowing examples which simulate application of the invention toseveral hypothetical requisitions.

Example 1 Cystic Fibrosis and Modifiers Variant Requisition

In one example, a requisition is submitted for Cystic Fibrosis. Therequisition originates from any of a clinician, company, partner, orindividual. The requisition is submitted electronically, on paper, viaweb service, from an EMR/PHR, or other means. The sample [any of blood,saliva, or cells, etc.] is de-identified and an encrypted ID isassigned. DNA from the sample associated with the requisition ispurified and quality assessed for downstream processing. The sample isprepared for sequencing with respect to the regions of interest asdefined by the Curation Database. All sequences relevant to any curatedloci of interest and/or genes relevant for conditions of interest istargeted for downstream sequencing by a combination of targetingmethodologies, in this case by targeted hybridization and ‘pull down’ ofselected regions in addition to targeted amplification of selectedregions. The sequencing is performed with one or moresequencing/analysis platforms and quality monitored for per base, perlocus, coordinate, and condition quality statistics, ultimately enablingsensitivity, specificity and accuracy statistics to be resolved on a perbase, locus, coordinate or condition basis. The Requisition Engineperforms the secondary analysis on the regions of interest as indicatedby the client in the sample requisition, in this case variants that areknown, novel and variants of unknown significance in the CFTR, MBL2 andTGFB1 genes which allows reporting on Cystic Fibrosis and knownmodifiers of Cystic Fibrosis as determined by the Locus Database. Theclient/patient ID informs the regions for the Analysis Engine tointerpret and per-condition results are determined. The data for theresults are made available to be delivered as any of a PDF, API forintegration into electronic records, or fax, etc.] as per client'sinstruction. In this case, the client's preferences are to receiveresults for known pathogenic variants, as well as novel variants andvariants of unknown significance. The data may be used by the clientdirectly, or reformatted into a clinical report summarizing the keyinformation for a patient.

Detailed Findings Assay Methodology

PCR amplicons were designed to all coding exons+2 bp intron/exonboundary and any previously observed pathogenic locus. Amplicons wereamplified in patient DNA using PCR and sequenced using the 454 Jr.Sequencer and copy number variants relative to the reference genomeGRCh37 (alias HG19) were determined using GATK.

Appendix Test Names:

Physicians can order a test by GeneName or ConditionName. However wewill report on conditions. This is because a gene can be involved in oneor more conditions and it is possible that we have not curated variantsfor the other condition.

Definitions Ethnicity

Used for reporting certain conditions with quantitative risk models.Physicians can select one or more of the choices. We can produce threescores: one for Asian, one for AA/Black, and one for White. If thephysician requests OTHER (or hand writes a non-Locus ethnic group) wewill provide three scores. If the physician requests Asian and White wewill provide two scores.

Pathogenic (Known)—ACMG Category 1

We will look for and report on pathogenic variants that have beenpreviously observed in affected individuals. Variants included inqualitative reports are considered pathogenic if they have been observedin an affected individual and result in a deleterious mutation (such asnonsense, truncation, disruption of consensus splice site, or disruptionof the initiator codon). Other types of mutations (such as missense,inframe substitutions or deletions) are considered pathogenic if thereis experimental evidence to support pathogenicity. Variants included inquantitative conditions are considered pathogenic if there are at leasttwo independent association studies in the same ethnic group showingstatistically significant association (after correction for multipletesting). If that same variant is also significantly associated in anadditional ethnic group it is considered pathogenic.

Pathogenic (Novel)—ACMG Category 2

Novel pathogenic mutations for qualitative reports are those that havenot been described in an affected individual and result in a deleteriousmutation (such as nonsense, truncation, disruption of consensus splicesite, or disruption of the initiator codon).

Ref Allele

Refers to the allele present at that coordinate in GRCh37. In some casesthe reference allele might be the risk allele.

Sequence

The consensus base call and coverage is provided for each basesequenced. In addition a variant file for the individual is producedwhich contains information about positions that differ from thereference genome.

Variant of Unknown Significance (VUS)—ACMG Categories 3-6

A variant of unknown significance may have been observed in affectedindividuals but does not meet the criteria of pathogenicity or may be anovel variant with unclear functional effect. We provide frequencyinformation and computational prediction for the functional effect ofthese variants.

General Disclaimer:

DNA studies do not constitute a definitive test for the selectedcondition(s) in all individuals. It should be realized that there aremany possible sources of diagnostic error. Genotyping errors can resultfrom trace contamination of PCR reactions and from rare genetic variantsthat interfere with analysis. This test is used for clinical purposes.It should not be regarded as investigational or for research. Thelaboratory is regulated under CLIA of 1988.

Example 2 Hereditary Non Polyposis Colon Cancer with Background Screen

In this example, a requisition is submitted for HNPCC. The requisitionoriginates from any of a clinician, company, partner, or individual. Therequisition is submitted electronically, on paper, via web service, froman EMR/PHR, or other means. The sample [blood, saliva, or cells, etc.]is de-identified and an encrypted ID is assigned. DNA from the sampleassociated with the requisition is purified and quality assessed fordownstream processing. The sample is prepared for sequencing withrespect to the regions of interest as defined by the Curation Database.All sequence relevant to any curated loci of interest and/or genesrelevant for conditions of interest is targeted for downstreamsequencing by a combination of targeting methodologies, in this case bytargeted hybridization and ‘pull down’ of selected regions in additionto targeted amplification of selected regions. The sequencing isperformed with one or more sequencing/analysis platforms and qualitymonitored for per base, per locus, coordinate, and condition qualitystatistics, ultimately enabling sensitivity, specificity and accuracystatistics to be resolved on a per base, locus, coordinate or conditionbasis. The Requisition Engine performs the secondary analysis on theregions of interest as indicated by the client in the samplerequisition, in this case variants that are known, novel and variants ofunknown significance in the EPCAM, MLH1, MSH2, MSH6 and PMS2 genes whichallows reporting on HNPCC as determined by the Locus Database as well asa background screen with pre-determined client thresholds on all otherconditions in the Locus Database. The client/patient ID informs theregions for the Analysis Engine to interpret and per-condition resultsare determined. The data for the results are made available to bedelivered as [a PDF, API for integration into electronic records, fax,etc.] as per client's instruction. In this case, the client'spreferences are to receive results for known pathogenic variants, aswell as novel variants and variants of unknown significance. The datamay be used by the client directly, or reformatted into a clinicalreport summarizing the key information for a patient.

Detailed Findings Assay Methodology

PCR amplicons were designed to all coding exons+2 bp intron/exonboundary and any previously observed pathogenic locus. Amplicons wereamplified in patient DNA using PCR and sequenced using the 454 Jr.Sequencer and copy number variants relative to the reference genomeGRCh37 (alias HG19) were determined using GATK.

Appendix Test Names:

Physicians can order a test by GeneName or ConditionName. However wewill report on conditions. This is because a gene can be involved in oneor more conditions and it is possible that we have not curated variantsfor the other condition.

Definitions Ethnicity

Used for reporting certain conditions with quantitative risk models.Physicians can select one or more of the choices. We can produce threescores: one for Asian, one for AA/Black, and one for White. If thephysician requests OTHER (or hand writes a non-Locus ethnic group) wewill provide three scores. If the physician requests Asian and White wewill provide two scores.

Pathogenic (Known)—ACMG Category 1

We will look for and report on pathogenic variants that have beenpreviously observed in affected individuals. Variants included inqualitative reports are considered pathogenic if they have been observedin an affected individual and result in a deleterious mutation (such asnonsense, truncation, disruption of consensus splice site, or disruptionof the initiator codon). Other types of mutations (such as missense,inframe substitutions or deletions) are considered pathogenic if thereis experimental evidence to support pathogenicity. Variants included inquantitative conditions are considered pathogenic if there are at leasttwo independent association studies in the same ethnic group showingstatistically significant association (after correction for multipletesting). If that same variant is also significantly associated in anadditional ethnic group it is considered pathogenic.

Pathogenic (Novel)—ACMG Category 2

Novel pathogenic mutations for qualitative reports are those that havenot been described in an affected individual and result in a deleteriousmutation (such as nonsense, truncation, disruption of consensus splicesite, or disruption of the initiator codon).

Ref Allele

Refers to the allele present at that coordinate in GRCh37. In some casesthe reference allele might be the risk allele.

Sequence

The consensus base call and coverage is provided for each basesequenced. In addition a variant file for the individual is producedwhich contains information about positions that differ from thereference genome.

Variant of Unknown Significance (VUS)—ACMG Categories 3-6

A variant of unknown significance may have been observed in affectedindividuals but does not meet the criteria of pathogenicity or may be anovel variant with unclear functional effect. We provide frequencyinformation and computational prediction for the functional effect ofthese variants.

General Disclaimer:

DNA studies do not constitute a definitive test for the selectedcondition(s) in all individuals. It should be realized that there aremany possible sources of diagnostic error. Genotyping errors can resultfrom trace contamination of PCR reactions and from rare genetic variantsthat interfere with analysis. This test is used for clinical purposes.It should not be regarded as investigational or for research. Thelaboratory is regulated under CLIA of 1988.

Example 3 Colorectal Cancer with Background Screen

In this example, a requisition is submitted for Colorectal Cancer and abackground screen for other conditions with pre-determined clientthresholds. The requisition originates from any of a clinician, company,partner, or individual. The requisition is submitted electronically, onpaper, via web service, from an EMR/PHR, or other means. The sample[blood, saliva, or cells, etc.] is de-identified and an encrypted ID isassigned. DNA from the sample associated with the requisition ispurified and quality assessed for downstream processing. The sample isprepared for sequencing with respect to the regions of interest asdefined by the Curation Database. All sequence relevant to any curatedloci of interest and/or genes relevant for conditions of interest istargeted for downstream sequencing by a combination of targetingmethodologies, in this case by targeted hybridization and ‘pull down’ ofselected regions in addition to targeted amplification of selectedregions. The sequencing is performed with one or moresequencing/analysis platforms and quality monitored for per base, perlocus, coordinate, and condition quality statistics, ultimately enablingsensitivity, specificity and accuracy statistics to be resolved on a perbase, locus, coordinate or condition basis. The Requisition Engineperforms the secondary analysis on the regions of interest as indicatedby the client in the sample requisition, in this case variants that areknown for Colorectal Cancer as well as analysis of all other conditionsin the Locus Database that are pathogenic above a pre-determined clientthreshold. The client/patient ID informs the regions for the AnalysisEngine to interpret and per-condition results are determined. The datafor the results are made available to be delivered as [a PDF, API forintegration into electronic records, fax, etc.] as per client'sinstruction. In this case, the client's preferences are to receiveresults for known pathogenic variants, as well as the results of thethreshold screen of all other conditions in the Locus Database. The datamay be used by the client directly, or reformatted into a clinicalreport summarizing the key information for a patient.

Detailed Findings Assay Methodology

PCR amplicons were designed to all coding exons+2 bp intron/exonboundary and any previously observed pathogenic locus. Amplicons wereamplified in patient DNA using PCR and sequenced using the 454 Jr.Sequencer and copy number variants relative to the reference genomeGRCh37 (alias HG19) were determined using GATK.

Appendix Test Names:

Physicians can order a test by GeneName or ConditionName. However wewill report on conditions. This is because a gene can be involved in oneor more conditions and it is possible that we have not curated variantsfor the other condition.

Definitions Ethnicity

Used for reporting certain conditions with quantitative risk models.Physicians can select one or more of the choices. We can produce threescores: one for Asian, one for AA/Black, and one for White. If thephysician requests OTHER (or hand writes a non-Locus ethnic group) wewill provide three scores. If the physician requests Asian and White wewill provide two scores.

Pathogenic (Known)—ACMG Category 1

We will look for and report on pathogenic variants that have beenpreviously observed in affected individuals. Variants included inqualitative reports are considered pathogenic if they have been observedin an affected individual and result in a deleterious mutation (such asnonsense, truncation, disruption of consensus splice site, or disruptionof the initiator codon). Other types of mutations (such as missense,inframe substitutions or deletions) are considered pathogenic if thereis experimental evidence to support pathogenicity. Variants included inquantitative conditions are considered pathogenic if there are at leasttwo independent association studies in the same ethnic group showingstatistically significant association (after correction for multipletesting). If that same variant is also significantly associated in anadditional ethnic group it is considered pathogenic.

Pathogenic (Novel)—ACMG Category 2

Novel pathogenic mutations for qualitative reports are those that havenot been described in an affected individual and result in a deleteriousmutation (such as nonsense, truncation, disruption of consensus splicesite, or disruption of the initiator codon).

Ref Allele

Refers to the allele present at that coordinate in GRCh37. In some casesthe reference allele might be the risk allele.

Sequence

The consensus base call and coverage is provided for each basesequenced. In addition a variant file for the individual is producedwhich contains information about positions that differ from thereference genome.

Variant of Unknown Significance (VUS)—ACMG Categories 3-6

A variant of unknown significance may have been observed in affectedindividuals but does not meet the criteria of pathogenicity or may be anovel variant with unclear functional effect. We provide frequencyinformation and computational prediction for the functional effect ofthese variants.

General Disclaimer:

DNA studies do not constitute a definitive test for the selectedcondition(s) in all individuals. It should be realized that there aremany possible sources of diagnostic error. Genotyping errors can resultfrom trace contamination of PCR reactions and from rare genetic variantsthat interfere with analysis. This test is used for clinical purposes.It should not be regarded as investigational or for research. Thelaboratory is regulated under CLIA of 1988.

Example 4 Fragile X with Background Screen Starting withClient-Generated Sequence

In this example, a requisition is submitted for Fragile-X and backgroundscreen with pre-determined client thresholds defined by any of aclinician, company, partner, or individual. The requisition is submittedelectronically, on paper, via web service, from an EMR/PHR, or othermeans including the sequence to be analyzed by Locus Development. Perbase, per locus, coordinate, and condition quality statistics for therequisitioned sequence are determined by the Locus Analysis ultimatelyenabling sensitivity, specificity and accuracy statistics to be resolvedon a per base, locus, coordinate or condition basis. The RequisitionEngine performs the secondary analysis on the regions of interest asindicated by the client in the sample requisition, in this case variantsthat are known in the FMR1 which allows reporting on Fragile X, FragileX expansion, Fragile X associate tremor and ataxia and FMR-1 relatedprimary ovarian insufficiency as well as a background threshold screenfor the other conditions in the Locus Database. The client/patient IDinforms the regions for the Analysis Engine to interpret andper-condition results are determined. The data for the results are madeavailable to be delivered as [a PDF, API for integration into electronicrecords, fax, etc.] as per client's instruction. In this case, theclient's preferences are to receive results for known pathogenicvariants, as well as novel variants and variants of unknownsignificance. The data may be used by the client directly, orreformatted into a clinical report summarizing the key information for apatient.

Detailed Findings Assay Methodology

PCR amplicons were designed to all coding exons+2 bp intron/exonboundary and any previously observed pathogenic locus. Amplicons wereamplified in patient DNA using PCR and sequenced using the 454 Jr.Sequencer and copy number variants relative to the reference genomeGRCh37 (alias HG19) were determined using GATK.

Appendix Test Names:

Physicians can order a test by GeneName or ConditionName. However wewill report on conditions. This is because a gene can be involved in oneor more conditions and it is possible that we have not curated variantsfor the other condition.

Definitions Ethnicity

Used for reporting certain conditions with quantitative risk models.Physicians can select one or more of the choices. We can produce threescores: one for Asian, one for AA/Black, and one for White. If thephysician requests OTHER (or hand writes a non-Locus ethnic group) wewill provide three scores. If the physician requests Asian and White wewill provide two scores.

Pathogenic (Known)—ACMG Category 1

We will look for and report on pathogenic variants that have beenpreviously observed in affected individuals. Variants included inqualitative reports are considered pathogenic if they have been observedin an affected individual and result in a deleterious mutation (such asnonsense, truncation, disruption of consensus splice site, or disruptionof the initiator codon). Other types of mutations (such as missense,inframe substitutions or deletions) are considered pathogenic if thereis experimental evidence to support pathogenicity. Variants included inquantitative conditions are considered pathogenic if there are at leasttwo independent association studies in the same ethnic group showingstatistically significant association (after correction for multipletesting). If that same variant is also significantly associated in anadditional ethnic group it is considered pathogenic.

Pathogenic (Novel)—ACMG Category 2

Novel pathogenic mutations for qualitative reports are those that havenot been described in an affected individual and result in a deleteriousmutation (such as nonsense, truncation, disruption of consensus splicesite, or disruption of the initiator codon).

Ref Allele

Refers to the allele present at that coordinate in GRCh37. In some casesthe reference allele might be the risk allele.

Sequence

The consensus base call and coverage is provided for each basesequenced. In addition a variant file for the individual is producedwhich contains information about positions that differ from thereference genome.

Variant of Unknown Significance (VUS)—ACMG Categories 3-6

A variant of unknown significance may have been observed in affectedindividuals but does not meet the criteria of pathogenicity or may be anovel variant with unclear functional effect. We provide frequencyinformation and computational prediction for the functional effect ofthese variants.

General Disclaimer:

DNA studies do not constitute a definitive test for the selectedcondition(s) in all individuals. It should be realized that there aremany possible sources of diagnostic error. Genotyping errors can resultfrom trace contamination of PCR reactions and from rare genetic variantsthat interfere with analysis. This test is used for clinical purposes.It should not be regarded as investigational or for research. Thelaboratory is regulated under CLIA of 1988.

Example 5 Test for all Conditions in the Locus Database

In one example, a requisition is submitted for all conditions in theLocus Database The requisition originates from any of a clinician,company, partner, or individual. The requisition is submittedelectronically, on paper, via web service, from an EMR/PHR, or othermeans. The sample [blood, saliva, or cells, etc.] is de-identified andan encrypted ID is assigned. DNA from the sample associated with therequisition is purified and quality assessed for downstream processing.The sample is prepared for sequencing with respect to the regions ofinterest as defined by the Curation Database. All sequence relevant toany curated loci of interest and/or genes relevant for conditions ofinterest is targeted for downstream sequencing by a combination oftargeting methodologies, in this case by targeted hybridization and‘pull down’ of selected regions in addition to targeted amplification ofselected regions. The sequencing is performed with one or moresequencing/analysis platforms and quality monitored for per base, perlocus, coordinate, and condition quality statistics, ultimately enablingsensitivity, specificity and accuracy statistics to be resolved on a perbase, locus, coordinate or condition basis. The Requisition Engineperforms the secondary analysis on the regions of interest as indicatedby the client in the sample requisition, in this case for all conditionsin the Locus Database. The client/patient ID informs the regions for theAnalysis Engine to interpret and per-condition results are determined.The data for the results are made available to be delivered as [a PDF,API for integration into electronic records, fax, etc.]

Appendix Test Names:

Physicians can order a test by GeneName or ConditionName. However wewill report on conditions. This is because a gene can be involved in oneor more conditions and it is possible that we have not curated variantsfor the other condition.

Definitions Ethnicity

Used for reporting certain conditions with quantitative risk models.Physicians can select one or more of the choices. We can produce threescores: one for Asian, one for AA/Black, and one for White. If thephysician requests OTHER (or hand writes a non-Locus ethnic group) wewill provide three scores. If the physician requests Asian and White wewill provide two scores.

Pathogenic (Known)—ACMG Category 1

We will look for and report on pathogenic variants that have beenpreviously observed in affected individuals. Variants included inqualitative reports are considered pathogenic if they have been observedin an affected individual and result in a deleterious mutation (such asnonsense, truncation, disruption of consensus splice site, or disruptionof the initiator codon). Other types of mutations (such as missense,inframe substitutions or deletions) are considered pathogenic if thereis experimental evidence to support pathogenicity. Variants included inquantitative conditions are considered pathogenic if there are at leasttwo independent association studies in the same ethnic group showingstatistically significant association (after correction for multipletesting). If that same variant is also significantly associated in anadditional ethnic group it is considered pathogenic.

Pathogenic (Novel)—ACMG Category 2

Novel pathogenic mutations for qualitative reports are those that havenot been described in an affected individual and result in a deleteriousmutation (such as nonsense, truncation, disruption of consensus splicesite, or disruption of the initiator codon).

Ref Allele

Refers to the allele present at that coordinate in GRCh37. In some casesthe reference allele might be the risk allele.

Sequence

The consensus base call and coverage is provided for each basesequenced. In addition a variant file for the individual is producedwhich contains information about positions that differ from thereference genome.

Variant of Unknown Significance (VUS)—ACMG Categories 3-6

A variant of unknown significance may have been observed in affectedindividuals but does not meet the criteria of pathogenicity or may be anovel variant with unclear functional effect. We provide frequencyinformation and computational prediction for the functional effect ofthese variants.

General Disclaimer:

DNA studies do not constitute a definitive test for the selectedcondition(s) in all individuals. It should be realized that there aremany possible sources of diagnostic error. Genotyping errors can resultfrom trace contamination of PCR reactions and from rare genetic variantsthat interfere with analysis. This test is used for clinical purposes.It should not be regarded as investigational or for research. Thelaboratory is regulated under CLIA of 1988.

Example 6 Test for all Conditions in the Locus Database with ClientGenerated Sequence

In this example, a requisition is submitted for all conditions in theLocus Database. The requisition originates from any of a clinician,company, partner, or individual. The requisition is submittedelectronically, on paper, via web service, from an EMR/PHR, or othermeans, including the sequence to be analyzed by Locus Development. Perbase, per locus, coordinate, and condition quality statistics for therequisitioned sequence are determined by the Locus Analysis ultimatelyenabling sensitivity, specificity and accuracy statistics to be resolvedon a per base, locus, coordinate or condition basis. The RequisitionEngine performs the secondary analysis on the regions of interest asindicated by the client in the sample requisition, in this case variantsfor all conditions in the Locus Database. The client/patient ID informsthe regions for the Analysis Engine to interpret and per-conditionresults are determined. The data for the results are made available tobe delivered as [a PDF, API for integration into electronic records,fax, etc.] as per client's instruction. In this case, the client'spreferences are to receive results for known pathogenic variants, aswell as novel variants and variants of unknown significance. The datamay be used by the client directly, or reformatted into a clinicalreport summarizing the key information for a patient.

Appendix Test Names:

Physicians can order a test by GeneName or ConditionName. However wewill report on conditions. This is because a gene can be involved in oneor more conditions and it is possible that we have not curated variantsfor the other condition.

Definitions Ethnicity

Used for reporting certain conditions with quantitative risk models.Physicians can select one or more of the choices. We can produce threescores: one for Asian, one for AA/Black, and one for White. If thephysician requests OTHER (or hand writes a non-Locus ethnic group) wewill provide three scores. If the physician requests Asian and White wewill provide two scores.

Pathogenic (Known)—ACMG Category 1

We will look for and report on pathogenic variants that have beenpreviously observed in affected individuals. Variants included inqualitative reports are considered pathogenic if they have been observedin an affected individual and result in a deleterious mutation (such asnonsense, truncation, disruption of consensus splice site, or disruptionof the initiator codon). Other types of mutations (such as missense,inframe substitutions or deletions) are considered pathogenic if thereis experimental evidence to support pathogenicity. Variants included inquantitative conditions are considered pathogenic if there are at leasttwo independent association studies in the same ethnic group showingstatistically significant association (after correction for multipletesting). If that same variant is also significantly associated in anadditional ethnic group it is considered pathogenic.

Pathogenic (Novel)—ACMG Category 2

Novel pathogenic mutations for qualitative reports are those that havenot been described in an affected individual and result in a deleteriousmutation (such as nonsense, truncation, disruption of consensus splicesite, or disruption of the initiator codon).

Ref Allele

Refers to the allele present at that coordinate in GRCh37. In some casesthe reference allele might be the risk allele.

Sequence

The consensus base call and coverage is provided for each basesequenced. In addition a variant file for the individual is producedwhich contains information about positions that differ from thereference genome.

Variant of Unknown Significance (VUS)—ACMG Categories 3-6

A variant of unknown significance may have been observed in affectedindividuals but does not meet the criteria of pathogenicity or may be anovel variant with unclear functional effect. We provide frequencyinformation and computational prediction for the functional effect ofthese variants.

General Disclaimer:

DNA studies do not constitute a definitive test for the selectedcondition(s) in all individuals. It should be realized that there aremany possible sources of diagnostic error. Genotyping errors can resultfrom trace contamination of PCR reactions and from rare genetic variantsthat interfere with analysis. This test is used for clinical purposes.It should not be regarded as investigational or for research. Thelaboratory is regulated under CLIA of 1988.

1. A method for analyzing diagnostic information, comprising: extractinggenomic DNA from a sample, obtaining sequence data for a plurality ofmarkers, the plurality of markers including a first set of markers whichare associated with one or more tests identified in a first requisitioncorresponding to the sample and a second set of markers which are notassociated with the one or more tests identified in the requisition;generating a response to the requisition, the response being based uponan analysis of sequence data corresponding to the first set of markersand the one or more tests with the generation of the response excludinganalysis of the second set of markers with respect to tests other thanthe one or more tests; assigning an identifier to the sequence data, theidentifier identifying the sample, the requisition and informationrelated to the analysis; storing the sequence data in a repository ofdata, the repository providing data for analysis; and reportingadditional results of an analysis of the sequence data associated withone or markers of the second set of markers in response to a secondrequisition, the second requisition identifying a test different fromthe one or more tests.
 2. The method of claim 1, wherein assigning anidentifier includes rendering the source of the sample and therequisition anonymous.
 3. The method of claim 2, wherein the secondrequisition includes the identifier.
 4. The method of claim 2, whereinthe information related to the analysis identifies the date of theanalysis.
 5. The method of claim 1, wherein generating a response to therequisition includes performing quality control on the results of ananalysis of the sequence data.
 6. The method of claim 5, whereinperforming quality control includes performing a comparison usingquality information derived from the repository.
 7. The method of claim6, wherein performing quality control includes updating the qualityinformation using the results of the analysis of the sequence data. 8.The method of claim 1, wherein the repository is accessible tocontributors through a portal.
 9. The method of claim 8, furthercomprising the step of updating the repository based on contributionsmade by the contributors.
 10. The method of claim 1 wherein the secondrequisition is made after receiving the results of the firstrequisition.
 11. The method of claim 1 wherein the test identified inthe second requisition was not available for identification at the timethe first requisition was made.
 12. The method of claim 11 wherein thesecond requisition was made at the time of the first requisition andcomprised a request for additional results associated with one or moretests not then available for requisition with such results to bedelivered after the occurrence of the availability for requisition of atleast one of such one or more tests which were not available at the timethe first requisition was made.
 13. A method for analyzing diagnosticinformation, comprising: extracting genomic DNA from a sample, obtainingsequence data for a plurality of markers, the plurality of markersincluding a first set of markers which are associated with one or moretests identified in a first requisition corresponding to the sample anda second set of markers which are not associated with the one or moretests identified in the requisition, wherein the first requisitionincludes a request to receive future notifications of health informationassociated with the sequence data; generating a response to therequisition, the response being based upon an analysis of sequence datacorresponding to the first set of markers and the one or more tests withthe generation of the response excluding analysis of the second set ofmarkers with respect to tests other than the one or more tests;assigning an identifier to the sequence data, the identifier identifyingthe sample, the requisition and information related to the analysis;storing the sequence data in a repository of data, the repositoryproviding data for analysis; after generating the response, generating anotification of health information based upon an analysis of some of thesequence data wherein the health information which is the subject of thenotification comprises health information which is different from thehealth information in the response; and generating such healthinformation after consent for its generation is received.
 14. The methodof claim 13 wherein the health information which is the subject of thenotification concerns a change to a prediction in the response wheresuch change arises due to a change in the content of the repository. 15.The method of claim 13 wherein the health information which is thesubject of the notification concerns one or more tests which were notavailable for requisition at the time of the first requisition.
 16. Themethod of claim 15 wherein the health information which is the subjectof the notification is generated with reference to sequence datacorresponding to at least one marker of the second set of markers. 17.The method of claim 13 wherein the consent for generating such healthinformation is provided with the first requisition.
 18. A system foranalyzing diagnostic information, comprising: a sample processingproduction line including a genomic DNA extractor configured to extractDNA from a biological sample, a target amplifier configured to amplifycomponents of the extracted DNA, and a sequencer that produces sequencedata for a plurality of markers from the amplified components, theplurality of markers including markers associated with one or more testsidentified in a requisition received with the sample; a sampleinformation management system (SIMS) that controls processing of thesample by processing production line and analysis of the results of theprocessing of the sample; a quality control (QC) database that providesthe SIMS with QC information, the SIMS using the QC information tovalidate the processing of the sample and the analysis of the results; arepository comprising one or more databases, the repository aggregatingthe results generated by processing a plurality of samples, therepository including the quality control database and a researchdatabase; and an analyzer that generates the results using informationin the repository.
 19. The system of claim 18, wherein informationidentifying a source of the sample is removed from the sample, therequisition and the results, and wherein the SIMS controls theprocessing and analysis of the system using a unique identifier assignedto the sample, the requisition and the results.
 20. The system of claim18, wherein a subset of the results are delivered to the source of thesample, the subset of results corresponds to a set of tests identifiedin the requisition. 21-28. (canceled)